Know Your Cloud (KYC): Azure Databricks Spot Instances

Ever wonder how to slash your Databricks costs? Look no further than spot instances!

What are Spot Instances?

Think of them as discounted virtual machines offered by cloud providers like Azure, AWS and GCP. They’re much cheaper (up to 90%!), but with a catch: they can be interrupted by the cloud provider at any time.

The Benefits:

  • Massive Cost Savings: Pay way less for your worker nodes, freeing up budget for other things.
  • Scale Up, Down, and Around: Easily adjust your infrastructure based on your needs without upfront costs.
  • Variety is King: Choose from different instance types and configurations to suit your specific tasks.

The Trade-offs:

  • Interruption Alert! Spot instances can be shut down anytime, so be prepared.
  • Not for the Faint of Heart: Not ideal for mission-critical workloads that need 24/7 uptime.
  • A Little Extra TLC: Requires some configuration and monitoring to handle interruptions gracefully.

Here’s the Catch (for Azure Only):

Microsoft requires at least one node (usually the driver) to be on full price in Azure Databricks. So, while you can save on worker nodes, the driver will always be charged the standard rate.

How to Enable?

If you are using Terraform to provision the cluster then you will be using the function databricks_cluster which has a section specific to spot instances and it varies by your cloud provider

Cloud ProividerTerraform Code
Azureazure_attributes { availability = "SPOT_WITH_FALLBACK_AZURE" first_on_demand = 1 spot_bid_max_price = 100 }
AWSaws_attributes { availability = "SPOT" zone_id = "us-east-1" first_on_demand = 1 spot_bid_price_percent = 100 }
GCPgcp_attributes { availability = "PREEMPTIBLE_WITH_FALLBACK_GCP" zone_id = "AUTO" }
Terraform Spot Instance Code

If you are willing to do this in an UI, then go to the cluster properties and you can find this option under the worker configuration.

Our Experience:

We found spot instances super effective for single-node Databricks jobs (no cost impact!), but they truly shine in multi-node clusters. The savings can be substantial!

The Takeaway:

  • Azure has limitations: Atleast one node should be set to charge at standard rate.
  • AWS and GCP offer full flexibility: Run all nodes as spot instances.

Ready to save? Consider spot instances for your Databricks workloads, but keep in mind the trade-offs and Azure’s limitations.

Reference

  • https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/cluster

Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *