BlogData Architecture

Access Control in the Data Warehouse: Row-Level Security, Column Masking, and Governance

James Okafor
James Okafor
Senior Data Engineer
·August 11, 202711 min read

Data warehouse access control determines who can see what data and under what conditions. As analytics environments grow, the simple approach — separate tables or environments for different audiences — becomes unmanageable. Modern warehouses provide row-level security, dynamic data masking, and column-level permissions that make fine-grained access control tractable at scale.

Data warehouse access control determines who can see what data and under what conditions. As analytics environments grow — more users, more use cases, more sensitive data — the simple approach of separate tables or separate environments for different audiences becomes unmanageable. Modern cloud warehouses provide row-level security, dynamic data masking, and column-level permissions that make fine-grained access control tractable at scale. Designing this correctly from the start is significantly less expensive than retrofitting it after the fact.

The Access Control Challenge

The naive approach to data warehouse access control is role-based object permissions: grant ANALYST_ROLE access to the reporting schema, grant FINANCE_ROLE access to the financial tables, deny everything else. This works for coarse-grained access but breaks down when:

**Sensitivity varies within a table** — a customer table contains both low-sensitivity attributes (account status, industry) and high-sensitivity attributes (full name, email, national ID). Granting access to the table grants access to all columns. Column-level controls are required to allow access to the low-sensitivity columns while masking the high-sensitivity ones.

**Access should vary by row** — a regional sales dashboard should show each regional manager only the records for their region. A support agent should be able to see the records for customers they are assigned to, not all customers. Row-level security implements this at the database layer, so the same table serves all users with each user seeing only the rows they are authorised to see.

**The same user needs different access in different contexts** — an analyst who is also a data steward may legitimately need to see unmasked PII for specific compliance tasks while their standard analyst role should see masked values. Context-based access requires role switching or context-aware masking policies rather than static role assignments.

Row-Level Security Implementation

Row-level security (RLS) restricts which rows a user can see in a query result, based on an attribute of the user (their role, their region, their team membership) and an attribute of the row (the region field, the team field, the account owner field).

The implementation mechanism varies by warehouse:

**Snowflake** implements RLS through Row Access Policies: a policy is created as a function that returns true or false for each row based on the current user's attributes. The policy is attached to the table and applies transparently to all queries — users who run a SELECT query see only the rows for which the policy returns true. Snowflake Row Access Policies can reference a mapping table that links user identifiers to permitted attributes, enabling dynamic access control without hardcoding user lists in the policy.

**BigQuery** uses row-level access policies directly on tables, with a filter expression that references the current user or their group memberships. Row-level policies in BigQuery are based on IAM group membership, making them manageable through the same identity infrastructure used for broader Google Cloud access control.

**Databricks** implements RLS through Unity Catalog row filter functions, which are attached to tables and operate on the same principle as Snowflake's Row Access Policies: a function returns a boolean filter condition based on the current user context.

The common architectural pattern across all platforms is a user-attribute mapping table: a table that stores, for each user or role, the values they are permitted to see. The RLS policy joins against this mapping table to determine the permitted rows. Maintaining this mapping table is the operational responsibility of the access control programme — as users join, move between teams, and leave, the mapping table needs to be updated.

Dynamic Data Masking

Dynamic data masking (DDM) returns masked values in place of sensitive data for users who do not have the authorisation to see the unmasked value. The masking happens at query time — the underlying data is not modified; users with appropriate privileges see the actual value while users without see a masked representation.

Masking strategies by data type:

**Full masking** — replaces the value entirely with a constant or null: an email address becomes 'MASKED', a national ID becomes 'XXX-XX-XXXX'. Appropriate for fields where even a partial value would be identifying or where no analytical use requires the actual content.

**Partial masking** — retains part of the value while obscuring the sensitive portion: a credit card number shows as 'XXXX-XXXX-XXXX-1234' (last four digits visible), an email shows as 'j***@example.com'. Useful when support agents need to verify identity without seeing the full value.

**Format-preserving masking** — replaces the sensitive value with a synthetic value of the same format: a real email address replaced with a fictional but format-valid email address. Useful for development and testing environments where the schema must be real but the data must not be.

**Tokenisation** — replaces the value with a reversible token. Authorised processes can detokenise to recover the original value; unauthorised access only sees the token. Used in payment card processing and where the actual value may need to be recovered for specific authorised operations.

In Snowflake, Dynamic Data Masking is implemented as a masking policy: a function that returns the actual value to users with a specified role and a masked value to all others. The masking policy is attached to the column; changing the policy updates masking for all queries without table modifications.

Role Design for Analytics Environments

Role design is the governance layer of access control. A well-designed role hierarchy makes access management tractable; a poorly designed hierarchy creates a proliferating set of one-off grants that is unmanageable and unauditable.

The standard pattern for analytics warehouse role design:

**Functional roles** correspond to job functions: ANALYST, FINANCE_ANALYST, DATA_ENGINEER, DATA_STEWARD, BI_DEVELOPER. Each functional role has the permissions appropriate for its job function.

**Privilege roles** accumulate permissions: READ_REPORTING has SELECT on the reporting schema; WRITE_STAGING has INSERT on the staging schema. Privilege roles are granted to functional roles, not to users directly.

**User groups** map users to functional roles through identity management (Active Directory, Okta, Google Groups). Users are never granted permissions directly; they are granted functional roles, which inherit privilege roles.

**Service accounts** for automated processes (ETL pipelines, BI tool connections, orchestration) should have minimal necessary permissions. The Tableau service account needs SELECT on the tables it queries; it does not need CREATE TABLE or DELETE permissions. Service account permissions should be audited quarterly to remove permissions that are no longer used.

The operational requirement is that role assignment is maintained as users change roles and leave the organisation. Quarterly access reviews — checking that every user's functional role matches their current job function and that no former employees retain active accounts — are the standard governance control for access management.

Our data architecture practice designs data warehouse access control architecture for organisations managing sensitive data at scale — contact us to discuss access control for your analytics environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →