
If you are on Databricks and not using Asset Bundles, you are missing out. If you are on a different data platform - you are also missing out, and probably wish you were on Databricks. None of the other platforms have anything quite equivalent that makes development and deployment workflows this smooth.
Databricks Asset Bundles (DABs) allow you to deploy code, workflows, and workspace-level resources as a single unit by declaratively defining them in YAML files. They bridge the gap between higher-level infrastructure management (where tools like Terraform or OpenTofu shine) and UI-driven development in the Databricks workspace.
And let’s be honest - YAML is king these days!
Targets enable reproducible deployments across multiple environments, very similar to dbt profiles. The same bundle can be deployed to dev, test, or prod with environment-specific configuration.
Variables let you control deployment behaviour per environment. For example, you might deploy a pipeline in a paused state in dev, but automatically unpause it in production.
In development mode, multiple users can work on the same bundle without stepping on each other’s toes. Databricks automatically prefixes deployed resources with the developer’s username, making it easy to distinguish who owns what.
One of the most underrated features: you can write notebooks as .py or .sql files, as long as they include the Databricks notebook header.
Example sample_task.py:
# Databricks notebook source
# COMMAND ----------
# DBTITLE 1,Set the parameters
dbutils.widgets.text("environment","dev")
env = dbutils.widgets.get("environment")
# COMMAND ----------
# DBTITLE 1,Create Schemas
# Create control schema in bronze catalog
spark.sql(f"create schema if not exists {env}_bronze.control")
And the job definition that calls this notebook:
resources:
jobs:
sample_job:
name: sample_job
email_notifications:
on_failure:
- ${var.email_notifications}
tasks:
- task_key: sample_task
notebook_task:
notebook_path: "sample_task.py"
This approach gives you the best of both worlds:
DABs encourage truly config-driven workflows. You can build reusable Python logic that reads from configuration files defining tables, sources, or pipelines. Adding a new table often means changing config only - no need to touch the core code.
Asset Bundles can manage permissions alongside resource creation, both at the workspace level and per resource. Example workspace-level permissions in a target:
targets:
dev:
workspace:
host: https://xxx.cloud.databricks.com/
root_path: /Shared/.bundle/${bundle.name}/${bundle.target}
permissions:
- group_name: dev_admin_group
level: CAN_MANAGE
- group_name: dev_developer_group
level: CAN_VIEW
Example workspace-level permissions in a target:
resources:
dashboards:
pipelines_dashboard:
display_name: Pipelines Dashboard
warehouse_id: ${var.warehouse_id}
file_path: config/pipelines_dashboard.lvdash.json
embed_credentials: true
permissions:
- group_name: ${var.env_code}_developer_group
level: CAN_RUN
- group_name: ${var.env_code}_admin_group
level: CAN_RUN
databricks bundle validate checks your YAML for syntax and correctness before deployment. It’s ideal for CI/CD pipelines to prevent configuration errors from sneaking in.
This validation complements - but does not replace - testing and validation of your SQL and Python logic.
You don’t need to write everything from scratch. Many supported resources can be created (or partially created) in the Databricks UI and then exported to YAML. You’ll usually need to replace hard-coded values with variables, but it’s a great way to experiment and discover available parameters.
Beyond Databricks’ starter templates, you can create your own. This is especially useful in larger organisations where you want to enforce standards and reuse common patterns from day one.
Something akin to dbt’s --select, allowing only specific resources to be deployed from a bundle. This would be particularly useful in development.
Native Jinja support or conditional resource creation would unlock even more flexibility.
Getting started with Asset Bundles is refreshingly simple.
Follow one of the official Databricks tutorials.
All you need is a Databricks account and a terminal.
databricks bundle init
This launches a friendly wizard that helps you choose a template and configure your project.
Before deploying, validate your bundle:
databricks bundle validate
Deploy to dev:
databricks bundle deploy --target dev --profile dev
Run a specific job or pipeline:
databricks bundle run --target dev job_name
Destroy resources when you’re done:
databricks bundle destroy --target dev
Databricks provides a solid and continually growing set of resources to help you get up to speed with Asset Bundles.
The current list of supported bundle resources is available in the official documentation. New resource types are added regularly, so it’s worth checking back as Asset Bundles continue to evolve.
In addition, Databricks maintains a set of sample Asset Bundles that demonstrate common patterns and best practices across different use cases These examples are a great reference when you’re trying to understand how a particular resource is defined in YAML or how multiple resources fit together in a real-world bundle.
Databricks Asset Bundles are a powerful addition to the Databricks ecosystem. They simplify development workflows, encourage best practices, and make collaboration significantly easier. Combined with source control and Terraform for underlying infrastructure, they complete the Databricks resource management story.
Here at Mechanical Rock, we love working with Databricks. If you have Databricks workflows or challenges you’d like help with, please don’t hesitate to get in touch!