Databricks started as a Spark-based data engineering platform. Its evolution into a full analytics platform — with SQL warehouses for BI workloads, Unity Catalog for unified governance, and Delta Lake as the storage foundation — makes it a credible alternative to purpose-built cloud data warehouses for organisations already on the Databricks platform.
Databricks built its initial reputation as the platform for large-scale Spark-based data engineering and machine learning — the choice for organisations processing data at a scale that traditional data warehouses could not handle. That positioning has broadened substantially. The platform now includes SQL warehouses optimised for BI workloads, Unity Catalog as a unified governance layer across all data assets, and Delta Lake as a storage format that supports both streaming and batch processing on the same data.
For organisations already using Databricks for data engineering or ML, the question of whether to add a purpose-built data warehouse (Snowflake, Redshift, BigQuery) for analytical workloads has become more interesting. The answer depends on what you are actually trying to do.
Delta Lake: The Storage Foundation
Delta Lake is an open-source storage layer that adds ACID transactions and structured metadata to Parquet files stored in object storage (S3, GCS, Azure Data Lake). Delta tables support:
**ACID transactions**: Concurrent reads and writes do not corrupt data. Multi-table transactions ensure consistency across related tables. Streaming writes and batch reads can coexist on the same table without conflicts.
**Time travel**: Delta maintains a transaction log that enables querying the state of a table at any point in history. RESTORE TABLE to a previous version for recovery. Query as-of a specific timestamp for point-in-time analytical comparisons. Audit who changed what and when.
**Schema evolution**: Add columns without breaking readers of the existing schema. Enforce schema on write to prevent unexpected column additions or type changes.
**Merge (upsert)**: Delta's MERGE INTO supports full upsert semantics — insert new records, update existing records based on a match condition, optionally delete records. This makes incremental processing patterns straightforward.
Delta Lake is the storage format for all Databricks-native tables. It interoperates with Spark, SQL warehouses, and Databricks ML — one storage format across all computational patterns in the platform.
SQL Warehouse
The SQL warehouse (formerly SQL Analytics) is Databricks' response to the need for BI-optimised query execution. It is a separate cluster type from standard Databricks compute clusters — optimised for interactive SQL query workloads rather than Spark job execution.
SQL warehouses use Photon, Databricks' native vectorised query engine written in C++. Photon is not Spark — it is a purpose-built columnar execution engine for SQL analytics workloads that runs significantly faster than Spark SQL for typical BI query patterns (aggregations, joins, window functions on Delta tables).
Performance characteristics of SQL warehouses against Delta tables:
- Simple aggregation queries: Photon typically executes in 2-5x less time than equivalent Spark SQL
- Complex multi-join queries: competitive with purpose-built warehouses for typical analytical patterns
- Concurrent user workloads: SQL warehouses support auto-scaling from the defined minimum to maximum cluster size based on query queue depth
For organisations running BI tools (Tableau, Power BI, Looker) against Databricks, SQL warehouses via the Databricks JDBC/ODBC connector or native connectors provide the query performance needed for interactive dashboard loads. Databricks publishes Tableau and Power BI connectors; Tableau has a native Databricks connector that uses SQL warehouse endpoints directly.
Unity Catalog
Unity Catalog is Databricks' unified governance layer, introduced in 2022 and now the standard for new Databricks deployments. It provides:
**Three-level namespace**: Assets are identified as catalog.schema.table — equivalent to database.schema.table in traditional warehouses. Unity Catalog is account-level, not workspace-level, meaning the same catalog is accessible across multiple Databricks workspaces.
**Centralised access control**: Column-level security, row-level security via row filters, dynamic data masking for sensitive columns — all managed in Unity Catalog and applied consistently regardless of which workspace or which Spark cluster accesses the data.
**Lineage tracking**: Unity Catalog automatically captures column-level lineage — which tables feed which tables, which columns are derived from which upstream columns. Lineage is captured across Delta table reads and writes without manual documentation.
**External data sources**: Unity Catalog governs not just Delta tables but also external tables (Parquet/CSV/JSON in object storage with external Unity Catalog metadata), views, models, volumes (arbitrary files in object storage), and connections to external databases. A single catalog governs the full data estate.
**Fine-grained sharing**: Delta Sharing, built on Unity Catalog, enables sharing specific tables or schemas with external organisations or across cloud accounts without data copying.
For organisations that previously maintained separate governance for Databricks data versus warehouse data versus ML feature store, Unity Catalog provides a path to unified governance — one access control model, one lineage map, one catalog for all data assets.
The Lakehouse Pattern
Databricks' "lakehouse" concept describes combining the flexibility of a data lake (arbitrary data formats, direct object storage, streaming, ML workloads) with the governance and query performance of a data warehouse (structured schema, ACID transactions, SQL analytics).
Delta Lake provides the transactional layer. Unity Catalog provides governance. SQL warehouse + Photon provides query performance. The result is a platform that can handle:
- Streaming ingestion from Kafka → Delta table (using Structured Streaming)
- Batch ETL/ELT transformation in Spark or dbt against Delta tables
- SQL analytics by BI tools against SQL warehouses
- ML training against the same Delta tables that feed the BI layer
- Feature store backed by Delta tables for consistent training and serving features
In a traditional two-tier architecture (data lake for raw/ML data, warehouse for analytics), data must be moved between tiers, creating duplication, synchronisation lag, and governance complexity. The lakehouse pattern eliminates the tier boundary — all workloads operate against the same data in the same storage.
When Databricks Is the Right Analytics Platform
Databricks is the natural analytics platform choice when:
- The organisation is already using Databricks for data engineering or ML and adding a separate warehouse creates unnecessary platform proliferation
- Workloads span structured SQL analytics, ML training, and streaming in a way that benefits from unified compute and storage
- Data volumes or transformation complexity justify Spark-scale compute even for the analytics layer
- Unity Catalog's unified governance across all data asset types (tables, files, models, features) is valuable
Databricks is a less natural fit when:
- The team is SQL-centric with no ML or Spark workloads — a purpose-built warehouse (Snowflake, BigQuery, Redshift) has a simpler operational model and often lower total cost for pure SQL analytics
- The BI tool integration requirements favour warehouses with stronger native connector support
- Cloud provider native preference aligns with BigQuery (GCP) or Redshift (AWS) rather than Databricks' multi-cloud model
Our cloud engineering and data architecture practice evaluates platform fit for each organisation's specific workload profile — contact us to discuss whether Databricks is the right analytics platform for your environment.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →