GDPR Data Architecture: Technical Requirements for Compliance at Scale

GDPR compliance is not primarily a legal problem — it is a data architecture problem. The rights it creates (access, rectification, erasure, portability) can only be operationalised if personal data is identifiable, locatable, and structured for modification. Most organisations discover their data architecture was not designed with these obligations in mind.

GDPR compliance is not primarily a legal problem — it is a data architecture problem. The rights it creates (access, rectification, erasure, portability) can only be operationalised if personal data is identifiable, locatable, and structured for modification. Most organisations that approach GDPR compliance through policy alone discover, when they receive their first data subject request, that their data architecture does not support the response the regulation requires.

The Architecture Implications of Data Subject Rights

**Right of access (Article 15)** requires providing a copy of all personal data held about an individual. The architecture implication is that personal data must be systematically discoverable: when a subject access request arrives with a name and email, the system must be able to locate every record associated with that individual across all data stores. This requires consistent individual identifiers across systems, a data inventory that covers all stores containing personal data, and a query mechanism that can retrieve all records for a specific identifier.

Organisations with fragmented data architecture — dozens of systems, inconsistent customer identifiers, undocumented data flows — cannot respond to SARs within the regulation's one-month timeframe without significant manual effort. The time cost of SAR response is one of the most concrete drivers of data architecture investment in privacy-regulated organisations.

**Right to erasure (Article 17)** requires deleting personal data when there is no longer a lawful basis for processing it. In analytics systems built on append-only warehouses, point-deletion is architecturally unnatural. The standard approaches:

The pseudonymisation approach: analytics tables contain a surrogate key instead of direct identifiers. Erasure is implemented by deleting the mapping from the surrogate key to the individual's identity, rendering the analytics records effectively anonymous without physically deleting rows. This approach preserves the aggregate analytical value of historical data while satisfying the erasure requirement for personal identifiers.

The physical deletion approach: when erasure is requested, execute DELETE statements against all relevant tables and rebuild derived tables and aggregations that included the deleted records. This is architecturally clean but operationally expensive, particularly for tables with complex downstream dependencies.

**Right to rectification (Article 16)** requires correcting inaccurate personal data. The architecture implications are similar to erasure: the personal data needs to be locatable and modifiable. For analytics tables derived from source systems, the correction should be applied at the source and the analytics tables rebuilt — not applied to analytics tables independently, which would create divergence between source and analytics.

**Right to portability (Article 20)** requires providing personal data in a machine-readable format. The architecture implication is that personal data must be exportable at the individual level in a structured format (JSON, CSV). This is typically simpler than erasure — the same SAR query that locates records for an individual can format them for export.

Data Inventory and Records of Processing Activities

GDPR Article 30 requires maintaining records of processing activities (RoPA) — documentation of what personal data is processed, for what purpose, by whom, how long it is retained, and with whom it is shared. The RoPA is the formal expression of the data inventory that privacy-compliant architecture requires.

Building a RoPA from scratch in a large organisation with a complex data estate is an exercise in understanding the data architecture: where does personal data come from, where does it go, what transformations does it pass through? This exercise frequently surfaces data flows that compliance and legal teams were unaware of — shadow IT databases, undocumented third-party integrations, analytics tables containing personal data that were never declared to compliance.

The RoPA should be a living document that is updated when new processing activities are established. In practice, this requires process integration: changes to data systems (new integrations, new tables, new data sources) should trigger a privacy review that assesses whether the change affects the RoPA. Building this into the CI/CD or data platform governance process is more reliable than relying on engineers to proactively report changes to compliance teams.

Consent and Lawful Basis Management

GDPR requires that each processing activity has a documented lawful basis: consent, contract, legitimate interest, legal obligation, vital interests, or public task. For data collected on the basis of consent, the consent needs to be recorded (when it was given, what it was given for, by whom) and honoured: if consent is withdrawn, processing on that basis must cease.

In analytics systems, managing consent requires:

**Consent state storage** — a consent management system that records the consent state for each individual for each processing purpose. This needs to be queryable by the analytics pipeline: when processing personal data for a consent-based purpose, the pipeline should check that valid consent exists for the individual being processed.

**Consent change propagation** — when a user withdraws consent, that withdrawal needs to propagate to the analytics system. Records for that individual that were processed under the withdrawn consent may need to be deleted or reclassified.

**Consent-scoped data access** — analytics queries against personal data should, in principle, only return records where the relevant consent or lawful basis applies. In practice, most analytics systems do not implement consent-scoped queries — they rely on the data having been collected with consent before landing in the analytics system. The design implication is that data collected without a valid lawful basis should be excluded from the analytics pipeline at ingestion, not at query time.

Technical Measures for GDPR Compliance

GDPR Article 25 requires data protection by design and by default — technical measures that implement privacy protections structurally. The specific technical measures that apply to data architecture:

**Encryption at rest and in transit** — standard baseline. Personal data in the warehouse should be encrypted at rest; connections to the warehouse should be encrypted in transit. Most cloud warehouses (Snowflake, BigQuery, Redshift) provide encryption at rest by default; enabling it and managing key rotation is an operational configuration requirement, not a custom implementation.

**Access controls** — personal data in the analytics system should be accessible only to processes and users with a legitimate need. Column-level masking (showing masked values to users without the appropriate role, full values to authorised users) is appropriate for highly sensitive fields (financial account numbers, health data, government identifiers). Row-level security restricts access based on the user's relationship to the data (a regional user can only see records for their region).

**Audit logging** — access to personal data should be logged. Who queried the table containing personal data, when, and what they queried. Audit logs are required for demonstrating compliance in an investigation and for detecting access patterns that indicate misuse.

**Data minimisation in transformation** — the transformation pipeline should strip direct identifiers from tables that do not require them for their intended use. An analytics table of user behaviour events does not need to contain name and email address if it will only be used for aggregate behaviour analysis.

Our data architecture practice designs GDPR-compliant data systems for organisations operating under EU data protection requirements — contact us to discuss the technical architecture for your GDPR compliance programme.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →