BlogAzure

Microsoft Purview: Data Governance for the Azure Ecosystem

James Okafor
James Okafor
Data & Cloud Engineer
·September 29, 202610 min read

How Microsoft Purview provides data cataloguing, classification, lineage, and access governance for Azure data estates — what it does well, where its limitations are, and how to integrate it with Azure Synapse, Fabric, and Power BI.

Microsoft Purview is Microsoft's unified data governance platform for the Azure ecosystem. It provides data discovery (cataloguing what data exists and where), data classification (identifying sensitive data), data lineage (tracking how data flows through your systems), and access governance (controlling who can see what). For organisations with significant Azure data estate, Purview is the natural starting point for enterprise data governance.

This guide covers what Purview actually does in practice — its capabilities, its limitations, and how it integrates with the Azure data services most organisations use alongside it.

Purview's Core Capabilities

**Data Map and Data Catalog.** The Data Map is Purview's automated discovery engine. You register data sources — Azure Synapse, Azure SQL Database, Azure Data Lake Storage, Azure Blob Storage, Power BI, and many others — and Purview scans them to discover what data they contain. Tables, views, files, and Power BI datasets are catalogued automatically with metadata: column names, data types, and row counts.

The Data Catalog provides the search and browse interface for the catalogued assets. Data consumers can search for specific data by name, keyword, or classification and see where it lives, what it contains, and (if Purview has lineage data) how it was produced.

**Data Classification.** Purview applies classification rules to discovered data, identifying sensitive content: names, email addresses, credit card numbers, national identification numbers, health information, and custom business-defined patterns. Classification happens automatically during the scan process using Microsoft's built-in classifiers and custom rules you define.

Classification is not access control — it identifies sensitive data but does not automatically restrict it. The value is visibility: knowing that a table in Azure Data Lake contains credit card numbers (when the security team thought it only contained aggregated metrics) is the first step toward appropriate governance.

**Data Lineage.** Purview captures lineage from several sources automatically:

- Azure Data Factory: pipeline-level lineage showing which datasets feed into which pipelines and outputs

- Azure Synapse Analytics: SQL lineage for views and stored procedures

- Power BI: lineage from data sources through datasets to reports

- dbt: via the Purview dbt integration that reads dbt's manifest.json

For organisations using ADF, Synapse, and Power BI as their primary data stack, Purview provides reasonably complete end-to-end lineage without significant additional configuration — from source database tables through ADF pipelines through the data warehouse through Power BI datasets to published reports.

**Business Glossary.** Purview includes a business glossary for managing canonical term definitions and associating them with data assets. The glossary can be used to annotate tables and columns with business meaning, linking the technical asset (the revenue_net column in the fct_orders table) to the business definition ("net revenue after returns and discounts").

**Data Sharing (Microsoft Purview Data Sharing).** Purview supports in-place data sharing within Azure — sharing access to Azure Storage data with other Azure tenants without copying. This is the Azure equivalent of Snowflake Data Sharing for organisations on Azure.

Limitations and Honest Assessment

Purview is a broad product, and breadth has come at the cost of depth in some areas.

**Scan quality varies by source.** Azure-native sources (Synapse, ADLS, Azure SQL) are well-supported with high-quality automatic scanning. Non-Azure sources (on-premises SQL Server, Salesforce, Snowflake via supported connectors) have variable quality — column-level lineage is often missing for non-Azure sources.

**Lineage for complex transformations is incomplete.** ADF pipeline-level lineage is good. Column-level lineage through ADF Data Flows is available but can be incomplete for complex transformation logic. For Python-based transformations in Databricks or custom Azure Functions, lineage must be injected manually using the Purview SDK or the Apache Atlas API.

**The user experience is improving but uneven.** Purview has been through significant UI evolution. The current interface is functional but not polished — search is workable, the lineage graph visualisation is useful for small lineage trees but becomes cluttered for large ones, and the glossary and classification workflows require patience.

**Active metadata and automation are limited.** Purview is primarily a read-and-discover tool. Automatically propagating classifications to downstream data (column classified as PII in the source system automatically propagated to all downstream tables derived from it) requires additional tooling or manual effort. Workflow automation based on Purview events is possible via Azure Event Grid but requires custom development.

Integration with the Azure Data Stack

**Azure Synapse Analytics.** Purview is designed as Synapse's governance layer. Synapse workspaces can be connected to a Purview account, enabling Synapse Studio to show Purview lineage and classifications inline. Synapse SQL dedicated pools and serverless SQL pools are scanned automatically.

**Microsoft Fabric.** Purview integration with Fabric is available but evolving. Fabric workspaces can be registered as data sources. Lineage from Fabric Dataflows and Fabric Notebooks is captured with varying completeness. As Fabric matures and Purview's Fabric integration deepens, this will become the primary governance integration for new Azure analytics builds.

**Power BI.** Purview captures Power BI lineage automatically when Power BI is registered as a data source. Datasets, reports, and dashboards appear in the Data Catalog. Lineage shows which Purview-catalogued data sources feed into which Power BI datasets. This enables the complete lineage chain: source system → ADF pipeline → Synapse → Power BI dataset → Power BI report.

**Azure Data Lake Storage.** Purview scans ADLS Gen2 containers and classifies data in Parquet, CSV, JSON, and other supported formats. For organisations using ADLS as their data lake layer (bronze/silver/gold zones), Purview provides discovery and classification across the lake without requiring the data to be loaded into a structured database.

**Databricks.** Purview supports Databricks as a registered source for scanning and lineage, but coverage is less comprehensive than for native Azure services. Column-level lineage for Spark transformations requires the Apache Atlas connector and Databricks Unity Catalog. For organisations using Databricks as their primary transformation platform, Unity Catalog may provide better lineage coverage within the Databricks ecosystem than Purview.

When to Use Purview vs Alternatives

Purview is the right choice when:

- You are primarily Azure-native (Synapse, ADF, Power BI) and want governance with minimal configuration

- You need data classification for compliance (GDPR, HIPAA) and Microsoft's built-in classifiers cover your requirements

- You want integrated lineage across the Azure stack without building custom solutions

Consider alternatives (Atlan, Alation, Collibra) when:

- Your data estate is multi-cloud or multi-platform (Snowflake-centric, Databricks-heavy, significant on-premises)

- You need deep column-level lineage across complex Python/Spark transformations

- Your governance workflow requirements (approval processes, stewardship workflows, data access management) go beyond Purview's current capabilities

- You need a more mature user experience and are willing to pay for it

For Azure data platform governance design including Purview implementation and integration with Synapse, Fabric, and Power BI, our cloud engineering services covers end-to-end Azure architecture — contact us to discuss your data governance requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →