BlogData Architecture

Data Security Architecture: Access Controls, Encryption, and Governance for Analytics Environments

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·June 26, 202713 min read

Analytics environments handle some of the most sensitive data in an organisation: customer financial records, employee data, operational metrics that would be material to competitors. Securing these environments requires specific architectural decisions — not just perimeter security, but data-layer controls that limit exposure even when the perimeter is breached.

Analytics environments are high-value targets. They aggregate data from many source systems — production databases, CRM platforms, financial systems, HR records — into a single location that is designed to be queried easily. The very accessibility that makes analytics valuable also creates a concentrated data risk: a single credential compromise or access control failure can expose what would otherwise require breaching many separate systems.

Securing analytics environments requires both perimeter security (network access controls, authentication) and data-layer security (column-level controls, row-level filters, encryption of sensitive fields). Perimeter security alone is insufficient; data-layer controls limit exposure even when the perimeter is penetrated.

Access Control Architecture

**Principle of least privilege at the data layer**: Every user and every service account should have access to the minimum data required for their role. This is easy to state and frequently violated in practice: administrators who set up the data warehouse give broad permissions for convenience, and those permissions are rarely narrowed as the environment matures.

Implement role-based access control at the data layer — not just at the warehouse login level. In Snowflake, this means functional roles (FINANCE_READ, MARKETING_ANALYST, DATA_ENGINEER_PROD) assigned to users, not direct grants to individual users. Role membership changes are manageable; individual grants accumulate into an unauditable mess.

In Databricks Unity Catalog, table grants, schema grants, and catalog grants are all separate. Define access at the most granular level that is practical: grant SELECT on specific tables in the consumption schema, not on the entire schema or catalog.

In BigQuery, IAM roles combined with dataset-level and table-level permissions define access. Avoid project-level BigQuery Data Viewer grants that expose all datasets; prefer dataset-level grants.

**Column-level security for sensitive fields**: Fields like SSN, date of birth, exact salary, credit card number, and similar PII or financial data should not be visible to all users with access to a table. Column-level security restricts which roles can see specific columns; other roles see null or a masked value.

Snowflake column masking policies define a masking function per data type per column. A SALARY column can be masked to show the value for FINANCE role, a range bucket for HR_GENERAL role, and null for all other roles. The masking policy is applied at query time without duplicating the table.

BigQuery column-level security uses policy tags: columns tagged with a specific policy tag are restricted to users with the decrypter role for that tag. Policy tags are defined in BigQuery Data Catalog.

**Row-level security for multi-tenant data**: Analytics tables that aggregate data across tenants, regions, or business units must ensure each user sees only the data they are authorised to access. Row-level security filters are applied at query time based on the user's identity.

In Snowflake, row access policies are functions applied to a table that return a filter condition based on the current user or current role. In Databricks, row filters are SQL expressions defined in Unity Catalog applied automatically to all table queries. In Redshift, row-level security is implemented via views that include a WHERE clause referencing the current user — or via Redshift's native row-level security feature (available in recent versions).

Encryption Architecture

**Encryption at rest**: All major cloud data warehouses (Snowflake, BigQuery, Redshift, Databricks) encrypt data at rest by default using platform-managed keys. For regulatory requirements that demand customer control over encryption keys, all four platforms support customer-managed encryption keys (CMEK / Bring Your Own Key / External Key Management). CMEK provides the ability to revoke access by revoking the key, but introduces operational risk: if the key is lost or the KMS is unavailable, the data becomes inaccessible.

**Encryption in transit**: All cloud warehouse connections should use TLS. Verify TLS is enforced for all connection types — JDBC/ODBC connections, REST API calls, replication connections. Some older connectors default to unencrypted connections; disable the unencrypted option at the server configuration level.

**Application-level encryption for specific fields**: For fields that require protection even from database administrators — high-value PII where even internal access should be audited and minimised — application-level encryption encrypts at the application layer before writing to the database. The database sees encrypted values; only the application with the decryption key can read the plaintext. This is appropriate for fields like SSN or payment card data in environments with strict compliance requirements.

Audit Logging

Security architecture without audit logging provides controls without visibility. Audit logs record who accessed what data, when, and what queries they ran. This is essential for:

- Detecting anomalous access patterns that indicate credential compromise

- Satisfying regulatory requirements for access auditing (SOC 2, HIPAA, GDPR data access logs)

- Investigating incidents retrospectively

**Snowflake**: Query history in ACCOUNT_USAGE views records all queries run, the user who ran them, and the data accessed. Access history (ACCESS_HISTORY view) records which tables and columns each query read and wrote. Both are retained for 365 days by default.

**BigQuery**: Cloud Audit Logs record all data access, admin activity, and system events. Data Access audit logs must be explicitly enabled for user read/write operations — they are not enabled by default.

**Databricks Unity Catalog**: Audit logs record all data access, permission changes, and governance events. Logs can be forwarded to SIEM systems via the audit log delivery configuration.

Route audit logs to a SIEM or log analytics platform — querying audit logs ad-hoc in the warehouse is not operationally scalable for security monitoring. A basic alert on anomalous query volume (more than 10x normal queries for a user in a day) or after-hours access to sensitive tables provides early warning of compromised credentials.

Network Security

**Private connectivity**: Production data warehouses should not be accessible over the public internet. Use private networking:

- Snowflake: Private Link (AWS PrivateLink, Azure Private Link, GCP Private Service Connect) makes Snowflake accessible only from within your private network

- BigQuery: VPC Service Controls create a perimeter that limits access to BigQuery from specific networks and service accounts

- Redshift: Deploy within a VPC; restrict security group rules to the minimum necessary source IPs and ports

**IP allowlisting**: For connections that cannot use private networking (developers connecting from home, third-party tools), IP allowlisting restricts warehouse access to known IP ranges. Combine with MFA for user authentication.

**Service account management**: Data pipeline service accounts should have programmatic access only — no interactive login. Rotate service account credentials on a defined schedule (quarterly is standard). Inventory service accounts regularly; orphaned service accounts from decommissioned pipelines are a persistent security risk.

Our data architecture practice designs security architectures for analytics environments that meet regulatory requirements without sacrificing analytical usability — contact us to discuss data security architecture for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →