BlogData Architecture

Data Architecture for Financial Services: Requirements, Patterns, and Pitfalls

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·June 9, 202611 min read

Financial services organisations face data architecture requirements that most enterprise platforms are not designed for: regulatory data lineage, real-time risk, strict access controls, and the need to reconcile trading, risk, and finance data across systems that were never designed to talk to each other.

The quick answer

Financial services data architecture is harder than general enterprise data architecture for three specific reasons: regulatory requirements impose data lineage, retention, and access control obligations that most data platform designs do not accommodate by default; real-time risk and trading systems require data latency measured in milliseconds that batch-oriented analytics infrastructure cannot serve; and the data itself — trading positions, risk calculations, client data, and regulatory reporting — originates in systems that were built independently across decades of acquisition and organic growth and have never been designed to integrate. The organisations that get financial services data architecture right address all three of these constraints explicitly at the design stage. Those that treat it as standard enterprise data architecture and discover the constraints mid-build pay for the rework.

The regulatory data requirements that change everything

Financial services organisations operate under a stack of data-specific regulatory obligations that most enterprises do not face. In the US: SEC, FINRA, OCC, FDIC, CFTC, and state-level regulators. Globally: FCA (UK), APRA (Australia), MAS (Singapore), ECB/EBA (Europe). These regulators have specific, non-negotiable requirements for how data must be managed, stored, and reported.

**BCBS 239 (Risk Data Aggregation and Risk Reporting)** — Basel Committee principles requiring that systemically important banks can aggregate risk data accurately and quickly. BCBS 239 mandates: data accuracy and integrity (risk data must be accurate, complete, and subject to appropriate reconciliation), completeness (all material risk data must be captured), timeliness (risk data must be available quickly enough for management decisions), adaptability (the architecture must support ad-hoc risk reporting). BCBS 239 is the single most significant regulatory driver of data architecture investment in global banking.

**Audit trail and lineage requirements.** Regulators require that financial institutions be able to demonstrate — under audit — where any reported figure came from, what transformations it underwent, and who had access to it. This is not a "document your pipelines" requirement; it is a technical requirement for column-level data lineage that is maintained automatically and queryable at any point in time. Data architecture that cannot produce an audit trail for a reported figure within hours of a regulatory request is non-compliant.

**Data retention.** Financial services data must be retained for specific periods (7 years for most trade data under MiFID II, longer under certain circumstances). Retention must be immutable — data cannot be modified after the retention period starts. Cloud data platforms must be configured with write-once, immutable storage for regulated data, with audit logs that demonstrate data has not been modified.

**Client data privacy.** Financial data is personal data under GDPR, CCPA, and equivalent regulations. Access to client data must be governed, logged, and auditable. Data minimisation requirements limit how long and in what form client data can be retained. Right-to-erasure requests require that client data be deleted across all downstream systems — which requires complete data lineage to identify every system holding a copy.

**Regulatory reporting.** Capital reporting (CRR/CRD in Europe, DFAST in the US), liquidity reporting (LCR, NSFR), and transaction reporting (MiFID II, EMIR) all require data that is accurate, reconciled, and produced to defined schedules. The data architecture must be designed to produce these reports reliably, with the lineage and reconciliation documentation that regulators expect.

The real-time risk requirement

Risk management in financial services — particularly for trading, lending, and insurance businesses — requires data that is current to the position, not current to yesterday's batch. A trading desk's risk exposure changes with every executed trade. An underwriter's portfolio risk changes with every new policy written. A credit portfolio's risk changes with every market movement.

The analytics infrastructure that serves these use cases cannot be built on overnight batch pipelines. It requires:

**Real-time position data.** Trades, positions, and transactions must be available in the risk system within seconds of execution — not in the next morning's extract.

**Real-time market data.** Risk calculations that incorporate market prices (equity prices, interest rates, FX rates, credit spreads) need the current market data, not yesterday's closing prices.

**Intraday risk calculation.** Risk metrics (VaR, stress test results, exposure calculations) must be recalculated continuously throughout the trading day, not just at end-of-day batch.

**Low-latency serving.** Risk dashboards used by traders, risk managers, and executives must return results in sub-second time — not in the 10–60 seconds that analytical warehouse queries typically take.

The architecture that serves this requirement is different from the analytics architecture that serves reporting and BI. It typically involves:

- Event streaming (Kafka or Azure Event Hubs) for real-time trade and market data ingestion

- In-memory data stores (Redis, Apache Ignite) for position data that needs sub-millisecond read latency

- A separate real-time calculation engine for risk metrics

- A downstream aggregation layer that feeds risk dashboards

This real-time layer sits alongside (not instead of) the analytical warehouse that serves management reporting and regulatory reporting. They serve different latency requirements from the same underlying data.

The data integration challenge

Financial services firms typically have among the most complex data landscapes of any industry — particularly at institutions that have grown through acquisition. A mid-size regional bank might have: a core banking system from the 1980s, a credit risk system from the 1990s acquired in a merger, a trading system from the 2000s, and a series of more modern systems layered on top. Each system has its own customer ID, its own product hierarchy, its own transaction format.

The integration challenges this creates:

**Customer identity resolution.** The same client may be a customer in the retail banking system (customer ID: 12345), a counterparty in the trading system (counterparty ID: CP-789), and a borrower in the lending system (borrower ID: BRW-001). Reconciling these three records to a single customer view requires master data management — often the most complex MDM programme in any industry.

**Product hierarchy reconciliation.** What a retail bank calls a "product" (a mortgage type, a savings account category) and what the risk system calls a "product" (an instrument class) are different taxonomies that need to be mapped to a canonical hierarchy for cross-functional reporting.

**Transaction format normalisation.** Trade records, payment transactions, loan drawdowns, and insurance premiums are all financially material events with very different data structures. Building a canonical transaction model that can represent all of these consistently is a significant data modelling challenge.

**System-of-record conflicts.** When two systems have different values for the same attribute — the client's address in the CRM does not match the address in the AML system — there is no simple resolution. The governance process for determining which system is authoritative for which attributes must be designed explicitly.

Data architecture patterns for financial services

**Medallion architecture with financial services extensions.** The Bronze/Silver/Gold pattern works well as the foundation, with financial services-specific additions:

- Bronze layer: raw ingestion with immutable write-once storage for regulatory retention requirements

- Silver layer: identity resolution (customer MDM), product hierarchy mapping, transaction normalisation

- Gold layer: regulatory reporting views, management reporting data products, risk data products

- Separate regulatory archive: immutable, tamper-evident storage with access logs for audit

**Operational Data Store (ODS) for intraday risk.** A separate in-memory or near-real-time store for position data and risk calculations that cannot wait for batch processing. The ODS is updated in real time from event streams; the analytical warehouse is updated in batch from the ODS.

**Segregated access control.** Financial data requires fine-grained access control: front-office traders cannot see back-office client data; compliance cannot see proprietary trading positions; external counterparties cannot see internal risk limits. Microsoft Purview, Unity Catalog, or Snowflake's column-level security implement this at the platform level.

**Immutable audit logging.** Every data access — not just every data write — must be logged for regulatory compliance. The audit log must be immutable (cannot be modified or deleted) and must be retained for the regulatory retention period. Most cloud data platforms provide access logs, but configuring them for regulatory compliance (capturing the right fields, retaining for the right duration, securing against tampering) requires deliberate design.

BI tools for financial services

Financial services organisations typically use Tableau or Power BI for analytics, with specific requirements:

**Tableau** is strong for complex financial visualisations, large extract-based datasets (balance sheet and P&L data at granular levels), and embedded analytics in client portals. Its REST API enables integration with risk systems for automated report distribution.

**Power BI** is strong for Microsoft-ecosystem organisations with Azure-hosted data platforms. Its integration with Azure Synapse, Fabric, and SQL Server is tight; DAX is well-suited for complex financial calculations (variance analysis, time-intelligence calculations over fiscal calendars).

For regulatory reporting specifically — the fixed-format reports required by regulators — neither Tableau nor Power BI is ideal. SQL-based report generation (using stored procedures or dbt models that produce the exact regulatory format) is more reliable than BI-tool-generated regulatory submissions.

See power bi vs tableau for the detailed platform comparison, and how to choose a bi tool for the broader decision framework.

Common mistakes in financial services data architecture

**Building analytics infrastructure that cannot scale to risk requirements.** Many financial services firms build a solid analytics platform for management reporting and then discover that the risk management use cases require a fundamentally different architecture. Designing the platform for analytics-only from the start and then trying to extend it to real-time risk is expensive. Design for the full use case spectrum from the start.

**Treating regulatory compliance as a retrofit.** Audit logging, immutable retention, column-level access control, and lineage tracking are significantly easier to implement when designed into the platform from the start. Retrofitting compliance controls onto an existing platform is expensive and often incomplete. Every financial services data platform should be designed with regulatory requirements as first-class requirements, not afterthoughts.

**Underestimating the MDM programme.** Customer identity resolution in a financial services firm that has grown through acquisition is among the most complex MDM challenges in any industry. Organisations that underestimate this scope — treating customer MDM as a 3-month project — regularly discover that the actual programme takes 18–24 months and requires cross-functional authority to resolve system-of-record conflicts between business units that have operated independently for decades.

**Centralising data without resolving system-of-record conflicts.** Building a data warehouse that ingests data from multiple systems without resolving which system is authoritative for which attributes produces a warehouse full of conflicting data that nobody trusts. The governance work — deciding which system wins for each attribute — must happen before the technical build, not during or after.

FAQs

What data platform do most financial services firms use?

Large global banks typically run on a combination: Snowflake or Azure Synapse for analytical workloads, Databricks for complex data engineering and ML, and purpose-built risk systems (Murex, Calypso, OpenGamma) for real-time risk calculation. Mid-size financial services firms more commonly use Snowflake or Azure Synapse as the primary platform with Databricks for specific ML use cases. Legacy Teradata environments are still common at large organisations that have not yet completed their cloud migration.

How do we handle data residency for a multi-jurisdiction financial firm?

Multi-jurisdiction financial firms face data residency requirements in multiple countries simultaneously. The common architecture: separate cloud regions for each jurisdiction with data residency requirements, connected via a shared governance layer. Client data for EU clients stays in EU-region cloud storage; client data for Australian clients stays in Australian-region cloud storage. Risk data that aggregates across jurisdictions is governed to ensure that per-client data does not cross jurisdictional boundaries. This requires careful architecture at the data modelling level — aggregates must be computable without moving underlying client data.

We have a BCBS 239 remediation programme. Where do we start?

BCBS 239 remediation typically involves: (1) data inventory — what risk data exists, where it lives, and what systems produce it; (2) accuracy assessment — what is the current quality of risk data and where are the gaps; (3) aggregation architecture — can the institution aggregate risk data to produce any required view within regulatory time constraints; (4) lineage — can the institution demonstrate the provenance of any reported risk figure. Most BCBS 239 programmes start with the data inventory and accuracy assessment, which surface the architectural gaps that the subsequent phases address.

Our data architecture consulting practice has designed and implemented data platforms for financial services organisations — banks, capital markets firms, insurance companies, and asset managers. If your organisation has regulatory data architecture requirements, a cloud migration programme, or a data quality problem that is affecting regulatory reporting, book a free 30-minute audit and we will assess your specific situation directly.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →