How to build data privacy compliance into your analytics data stack — the specific technical controls required by GDPR, CCPA, and HIPAA, how to implement them in a cloud data warehouse, and what your data architecture must support to respond to data subject requests.
Data privacy regulations — GDPR, CCPA, and HIPAA — impose specific technical controls on how organisations collect, store, process, and delete personal data. These controls are not abstract legal requirements; they translate directly into specific data architecture and engineering decisions. Data teams that understand what each regulation actually requires technically are far better positioned to build compliant systems than those who rely on legal teams to interpret requirements without technical input.
This guide covers the specific technical requirements of each regulation and how they translate into data architecture and data warehouse design decisions.
GDPR: European Data Privacy
GDPR (General Data Protection Regulation) applies to any organisation that processes personal data of EU residents, regardless of where the organisation is based. The regulation defines personal data broadly: any information that identifies or can identify a natural person — names, email addresses, IP addresses, behavioural tracking data, location data.
**The right to access (Article 15)**: data subjects can request a copy of all personal data an organisation holds about them. Technically, this requires the ability to query all systems that hold personal data for a specific individual and compile the results. The systems that hold personal data include: CRM, web analytics, marketing platforms, and frequently your data warehouse (which may have copies of customer transaction data, behaviour data, and other personal attributes).
**The right to erasure (Article 17)**: data subjects can request deletion of their personal data. In a data warehouse context, this creates significant architectural challenges. Row deletion in a historical analytics table is straightforward; deletion from Parquet files on a data lake (which are immutable) requires rewriting the affected files; deletion from backups may require a separate backup anonymisation process.
**Data minimisation (Article 5)**: organisations should collect only the personal data necessary for the specified purpose. In practice, this means analytics systems should not copy PII fields that are not required for analytical use. A user_id is sufficient for most analytics; copying names, email addresses, and physical addresses into your data warehouse may not be necessary.
**Pseudonymisation**: GDPR encourages pseudonymisation — replacing direct identifiers with pseudonymous IDs — as a privacy-by-design technique. In analytics architectures, this means using a pseudonymous customer ID in analytical tables rather than name and email, with the mapping between pseudonymous ID and identity held in a separate, more restricted system.
**Data retention limits**: personal data should not be retained longer than necessary for the specified purpose. Analytics teams often retain historical data indefinitely; GDPR requires retention policies that delete or anonymise data after a defined period.
Technical controls required:
- Column-level data classification identifying all PII fields in the data warehouse
- Row-level security restricting access to PII fields to authorised roles
- Data subject access request (DSAR) workflow with the ability to query all systems
- Data deletion capability including lake-format files
- Documented data retention and deletion policies with automated enforcement
CCPA: California Consumer Privacy
CCPA (California Consumer Privacy Act, and its extension CPRA) applies to for-profit businesses that meet size thresholds and collect personal information of California residents. CCPA's requirements overlap substantially with GDPR but differ in scope and emphasis.
**Right to know and right to access**: similar to GDPR Article 15. Consumers can request what personal information is collected and how it is used.
**Right to delete**: similar to GDPR right to erasure, with some business exception categories.
**Right to opt out of sale**: consumers can opt out of the sale or sharing of their personal information to third parties. In a data analytics context, "sale" includes sharing with advertising platforms for targeted advertising. Technical implementation requires maintaining opt-out status and excluding opted-out users from data shared with advertising partners.
**Sensitive personal information**: CCPA specifically defines a category of sensitive personal information (social security numbers, financial data, health data, precise geolocation, racial/ethnic origin) with additional restrictions. Audit whether your analytics systems hold sensitive personal information and apply additional controls if they do.
Technical controls required:
- Opt-out tracking in the data warehouse (a flag on the user record indicating consent status)
- Downstream activation controls (reverse ETL and marketing integrations must respect the opt-out flag)
- Documented categories of personal information collected, with purpose specification
HIPAA: US Health Data
HIPAA (Health Insurance Portability and Accountability Act) applies to covered entities (healthcare providers, health plans, healthcare clearinghouses) and their business associates (vendors that handle protected health information — PHI). PHI includes any health information that can identify an individual.
HIPAA's Security Rule requires technical safeguards for PHI:
**Access controls**: unique user identification, automatic logoff, encryption and decryption of PHI. In data warehouse context: MFA for all access, no shared accounts, session timeout, encryption at rest and in transit.
**Audit controls**: hardware, software, and procedural mechanisms to record and examine access to PHI. In practice: comprehensive query logging for all access to tables containing PHI, with logs retained for 6+ years.
**Integrity controls**: mechanisms to ensure PHI is not altered or destroyed improperly. In practice: write-once storage for PHI audit logs, change data capture logging for PHI tables.
**Transmission security**: encryption of PHI in transit. In practice: TLS 1.2+ for all connections to systems holding PHI.
**De-identification**: HIPAA provides a safe harbour for de-identified health data. Properly de-identified data is not PHI and is not subject to HIPAA restrictions. De-identification under the Safe Harbor method requires removing 18 specific identifiers (names, geographic data smaller than state, dates, telephone numbers, email addresses, SSNs, etc.). Truly anonymised analytical datasets can be processed without HIPAA controls.
Technical controls required:
- Encryption at rest and in transit for all systems holding PHI
- Comprehensive access logging with 6-year retention
- Access controls with MFA and session management
- Business Associate Agreements (BAAs) with all cloud vendors that process PHI — Snowflake, AWS, Google Cloud, Azure all offer HIPAA-eligible services and BAAs
Technical Implementation in Cloud Data Warehouses
**Column-level data classification.** The foundation of compliance. Every column in the data warehouse that contains personal data should be tagged with its classification (PII, sensitive PII, health data, financial data). This classification drives downstream access controls, masking policies, and deletion scope. Snowflake's OBJECT_PROPERTIES, BigQuery's policy tags, and Databricks Unity Catalog's column tags all support this.
**Column masking policies.** Masking policies hide sensitive data from unauthorised roles. A masking policy on an email column returns the full email to authorised analysts and NULL (or a hashed value) to unauthorised roles. Masking is applied at query time — the underlying data is stored unmasked, but queries from unauthorised roles see masked values.
**Row access policies.** In multi-tenant systems or systems where data subjects should only see their own data, row access policies restrict query results by the querying user's identity.
**Data retention automation.** Define retention periods for each data classification tier and implement automated deletion or anonymisation after the retention window expires. Snowflake's data retention and time travel features, combined with scheduled tasks, enable automated data lifecycle management.
**Audit logging.** Every query that touches personal data should be logged. Snowflake's QUERY_HISTORY, BigQuery's DATA_ACCESS logs, and similar warehouse features provide query-level audit trails. Store audit logs in a separate, immutable system from the data warehouse — access log tampering is itself a compliance failure.
Our data architecture consulting practice designs privacy-compliant analytics architectures. For organisations with GDPR, CCPA, or HIPAA requirements — contact us to discuss how to build compliance into your data architecture.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →