Databricks Unity Catalog: What It Is and How to Implement It

Unity Catalog is Databricks centralised governance layer for data and AI assets. Here is what it provides, how it differs from legacy Hive metastore, and how to implement it in a production Databricks environment.

The quick answer

Unity Catalog is Databricks' unified governance layer for data and AI assets — tables, files, ML models, notebooks, dashboards, and functions. It replaces the legacy per-workspace Hive metastore with a centralised, account-level catalog that spans all Databricks workspaces. Unity Catalog provides row-level and column-level security, data lineage, audit logs, and a single governance model across all assets. For organisations running Databricks in production, migrating to Unity Catalog is the current recommended architecture and will be required as legacy metastore support is phased out.

What the legacy Hive metastore lacked

The legacy Databricks architecture used a per-workspace Hive metastore — each workspace had its own catalog, and sharing data between workspaces required copying data or complex storage account permissions. Access control was coarse: you could grant access to a database or table but not to specific rows or columns. There was no centralised audit log across workspaces, no lineage tracking, and no way to see all data assets across a Databricks deployment in one place.

Unity Catalog architecture

**Three-level namespace**: Unity Catalog introduces a three-level naming hierarchy: catalog.schema.table (e.g., prod.finance.orders). The catalog is a new top-level container — above the schema (database) and table levels. Each Databricks workspace is associated with a Unity Catalog metastore at the account level.

**Metastore**: the account-level container for all Unity Catalog metadata. One metastore per region (typically). All workspaces in the account that are assigned to the metastore share its catalogs, schemas, and tables.

**Catalogs**: the top-level namespace. You can create multiple catalogs — prod, dev, sandbox, a catalog per business domain. Catalogs allow logical separation of environments and domains within the same metastore.

**External locations**: Unity Catalog manages access to cloud storage (S3, ADLS, GCS) through External Location objects — a registered S3 path with associated cloud credentials. Tables defined on top of these External Locations use storage credentials managed by Unity Catalog, not by individual workspace IAM roles. This centralises storage access control.

**Managed vs external tables**: Managed tables store data in the metastore's managed storage location. External tables reference data in registered External Locations. For most production patterns, external tables on Delta Lake files in S3 or ADLS are preferred — Unity Catalog governs access, but data lives in your own cloud storage.

Access control

Unity Catalog uses a hierarchical GRANT/REVOKE model applied at the metastore, catalog, schema, or table level — permissions inherit downward but can be overridden. The principal types are users (email-based), service principals (for automation), and groups (from the Databricks account or connected identity providers).

**Privilege levels**: USE CATALOG (browse the catalog), USE SCHEMA (browse schemas), SELECT (read table), MODIFY (write/delete), CREATE TABLE, ALL PRIVILEGES.

**Row-level security (RLS)**: Row filters allow you to restrict which rows a given user or group can see from a table. Defined as a SQL function that returns a boolean — applied automatically on every query without the consumer needing to add filter conditions.

**Column masking**: Column masks replace sensitive column values with transformed values (e.g., masking a SSN to show only the last 4 digits) for users without the appropriate privilege. Applied automatically — the consumer sees masked data without knowing the raw value exists.

**Delta Sharing**: Unity Catalog integrates with Delta Sharing — Databricks' open protocol for sharing live data across organisations and cloud platforms. A Unity Catalog table can be shared to recipients outside your Databricks account without copying data.

Data lineage

Unity Catalog automatically captures column-level lineage for queries run in Databricks — which tables were read to produce which output columns. Lineage is surfaced in the Databricks UI (Data Explorer) and available via API. For impact analysis (if I change this source table, what downstream tables are affected?) and compliance (where does this PII column come from?), column-level lineage is a significant capability upgrade over the legacy metastore.

**Cross-workspace lineage**: because Unity Catalog is account-level, lineage is captured across workspaces — a table produced in a pipeline workspace and consumed in an analytics workspace is tracked as a single lineage graph.

Audit logs

Unity Catalog writes audit events to cloud storage (S3 or ADLS) in configurable log buckets. Audit events include: data reads (SELECT queries), data writes, permission changes, login events, and compute access. Audit logs are consumed in SIEM tools (Splunk, Datadog) or queried directly via Databricks SQL for compliance reporting.

For regulated industries (financial services, healthcare, insurance), the combination of column-level access control, row-level security, column masking, and audit logs makes Unity Catalog essential for HIPAA, PCI DSS, and SOC 2 compliance.

Migration from legacy Hive metastore

Migrating from the per-workspace Hive metastore to Unity Catalog involves:

1. Enable Unity Catalog on the Databricks account (requires account admin)

2. Create a metastore for the region

3. Assign workspaces to the metastore

4. Create External Locations for your cloud storage paths

5. Migrate tables: use the SYNC command to migrate Hive metastore databases to Unity Catalog schemas, or recreate table definitions pointing to existing Delta files

6. Recreate grants and permissions using Unity Catalog's GRANT model

7. Update job and notebook code from two-level (schema.table) to three-level (catalog.schema.table) namespace references

8. Enable row filters and column masks for tables containing sensitive data

Databricks provides migration tools (the Unity Catalog Migration Toolkit) that automate portions of this. The migration effort varies from weeks (a single workspace with a manageable table count) to months (large multi-workspace deployments with complex permission structures).

Unity Catalog vs other governance tools

Unity Catalog is Databricks-specific — it governs Databricks assets. For multi-platform governance (Databricks + Snowflake + BigQuery + Redshift), external catalogs like Apache Atlas, OpenMetadata, Alation, or Collibra can ingest Unity Catalog metadata via API to provide a cross-platform view.

For the broader Databricks platform context, see databricks pricing guide and azure synapse vs databricks. For the open table formats that Unity Catalog governs, see delta lake guide.

Our data architecture consulting practice implements Databricks Unity Catalog — from account setup and external location configuration through table migration, permission modelling, and row-level security implementation. If you are migrating from legacy Hive metastore or designing a new Unity Catalog governance model, book a free 30-minute audit.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →