We recently helped a client escape a painful Redshift anti-pattern: AWS EventBridge triggers firing Lambda functions that invoked stored procedures to fully refresh all silver table data. No incrementality. No intelligence. Just brute force computation on a schedule, regardless of whether source data had changed.
When we migrated them to Databricks, we replaced this entire orchestration mess with a single feature: materialized views.
The old way: orchestration overhead
The Redshift architecture looked something like this:
EventBridge Rule (cron)
→ Lambda Function
→ Redshift Stored Procedure
→ Full refresh of silver tables
→ Hours of unnecessary computation
Every trigger meant:
- Re-reading all source data
- Re-computing all transformations
- Overwriting all target data
- Paying for compute whether data changed or not
Add multiple tables to this pattern and you've got a scheduling nightmare. Which Lambda runs first? What if one fails? How do you handle dependencies? You end up building a state machine or Step Function just to orchestrate basic data refreshes.
The new way: let Databricks do the work
With Databricks materialized views, the entire orchestration layer disappears. Here's a real dbt example from the migration:
{{
config(
materialized='materialized_view',
schedule = none if target.name == 'dev' else {
'cron': '0 0 */2 ? * *'
},
alias = 'cycle_summaries'
)
}}
SELECT
cycle_id,
equipment_id,
start_time,
end_time,
DATEDIFF(minute, start_time, end_time) as duration_minutes,
total_volume,
avg_pressure
FROM {{ ref('bronze_cycles') }}
WHERE status = 'COMPLETED'
That's it. No Lambda. No EventBridge. No stored procedure. The materialized view:
- Automatically detects when source data changes (feature in progress)
- Incrementally refreshes only affected rows
- Runs on the schedule you define in the view itself
- Manages dependencies through dbt's ref() function
Key benefits we observed
1. Incremental by default Databricks materialized views track which source rows have changed and only recompute affected downstream rows. In our client's case, this reduced refresh times from hours to minutes.
2. Schedule in the dbt file The schedule lives with the dbt view definition, versioned in git:
schedule = {
'cron': '0 0 */2 ? * *' # Every 2 hours
}
Development environments can disable automatic refreshes entirely:
schedule = none if target.name == 'dev' else { ... }
This pattern is gold for multi-environment deployments.
3. No external orchestration Databricks handles the scheduling, dependency resolution, and refresh logic. You write SQL. The platform does the rest.
4. Built-in observability Every materialized view exposes metadata about refresh history, duration, and row counts through system tables. No need to instrument Lambda functions or parse CloudWatch logs.
5. Cost efficiency You only pay for compute when data actually changes. Contrast this with EventBridge triggers that fire regardless of whether there's new data to process.
Materialized views aren't a silver bullet:
- Complex aggregations: Very heavy transformations may still be better as scheduled jobs
- Streaming requirements: If you need sub-minute latency, consider Delta Live Tables streaming
- Cross-catalog queries: Materialized views must reference tables in the same metastore
- Schema evolution: Adding columns requires view recreation
For these cases, you might still reach for workflows or DLT pipelines. But for the 80% of teams whose use case is "refresh this table when source data changes," materialized views are unbeatable
Migration strategy
Here's how we approached the Redshift → Databricks migration:
- Inventory existing stored procedures - Understand what each one actually does
- Identify full vs incremental - Which refreshes can be incremental?
- Convert to dbt models - Start with standard views or tables
- Test refresh logic - Validate incremental behaviour in dev
- Switch to materialized views - Enable scheduling once validated
- Monitor and tune - Use Databricks system tables to track performance
- Decommission orchestration - Remove Lambda functions and EventBridge rules
The client went from Lambda functions and EventBridge rules to zero external orchestration. Their data freshness actually improved because Databricks refreshes are faster and more frequent.
Takeaways
Materialized views in Databricks represent a fundamental shift in how we think about data transformation scheduling. Instead of building orchestration around data pipelines, we let the data platform handle it natively.
For teams migrating from Redshift, or legacy warehouses with complex scheduling logic, this is a game-changer. Less code. Less infrastructure. Faster refreshes. Lower costs.
If you're drowning in EventBridge rules, Lambda functions, and full-table refreshes, it might be time to consider Databricks materialized views.
At Mechanical Rock, we specialise in modernising data platforms and migration strategies. If you're considering a move to Databricks or want to optimise your existing workloads, let's talk!
Getting started
If you're using dbt with Databricks, materialized views are trivial to adopt:
{{
config(
materialized='materialized_view',
schedule = {'cron': '0 * * * *'} # Hourly
)
}}
SELECT ... FROM {{ ref('source_table') }}
One thing to keep in mind is that Databricks uses quartz cron for it's scheduling. It's also worthwhile to keep a lookout for any future Databricks updates - currently the TRIGGER ON UPDATE schedule is in beta but once productionised it could further simplifiy materialized views scheduling.