BlogData Engineering

ERP Data Analytics: Extracting Business Intelligence from SAP, Oracle, and Microsoft ERP Systems

Obed Tsimi
Obed Tsimi
Founder & Senior Tableau Architect
·December 1, 202611 min read

The architecture of ERP analytics — extracting data from SAP, Oracle EBS/Fusion, and Microsoft Dynamics, the complexity that makes ERP the hardest source system to integrate, and the data model design that makes ERP data analytically useful rather than just technically present.

ERP systems — SAP, Oracle EBS, Oracle Fusion/Cloud, Microsoft Dynamics 365, Infor, Workday — are the operational backbone of most mid-market and enterprise organisations. They hold the authoritative record for financial transactions, procurement, inventory, HR, and manufacturing operations. They are also, consistently, the hardest source systems to integrate into an analytics stack.

The complexity is not incidental. ERP systems are designed for transactional processing, not analytical reporting. The data models are designed for flexibility, configurability, and operational correctness — not for the join patterns and aggregation structures that make analytical queries fast and intuitive. This guide covers the practical architecture of ERP analytics: extraction, data model design, and the common pitfalls.

Why ERP Integration Is Hard

**Hundreds of tables with opaque relationships.** A full SAP S/4HANA installation has thousands of tables. The relationships between them are documented in SAP's data model documentation, but the relevant tables for any given business process (Procure-to-Pay, Order-to-Cash, Financial Close) are scattered across dozens of tables with names like BSEG, BKPF, VBAP, MARA, EKPO. The semantic meaning of tables is not obvious from the names.

**Highly configurable data models.** Every ERP implementation is configured differently. Chart of accounts, cost centre hierarchies, customer group structures, and status code definitions are implementation-specific. Analytical models built for one SAP implementation may not work for another without modification. This means there are no off-the-shelf analytical models — the transformation logic must be built for each implementation.

**Soft deletes and change tracking via document status.** ERP systems rarely delete records. Instead, they maintain status flags, reversal documents, and posting periods. A cancelled invoice does not disappear from BSEG — it has a reversal document that offsets it. Analytical models must understand these status patterns to compute correct balances.

**Posting period architecture.** Financial transactions are posted to periods (month/year), not just dates. A transaction can be backdated to a prior period even if it is entered today. Analytical models that simply filter by transaction date without understanding posting periods will produce incorrect period-over-period comparisons.

**Large table volumes.** SAP line item tables (BSEG, MSEG, VBAK) at large enterprises contain hundreds of millions to billions of rows. Full extraction of these tables requires careful incremental design to avoid extracting the entire table on every run.

Extraction Architecture by ERP System

**SAP** (S/4HANA, ECC, BW):

Managed connectors (Fivetran SAP connector, Airbyte SAP) extract from SAP via the SAP BAPI or RFC interface. These handle authentication, incremental extraction, and schema mapping. The limitation: most managed connectors extract only a subset of SAP tables — the commonly analytical ones (BSEG, BKPF, KNA1, LFA1, MARA, EKPO) rather than the full SAP schema.

SAP also provides native extraction mechanisms: SAP HANA Smart Data Integration (SDI), SAP Data Intelligence, and the Operational Data Provisioning (ODP) framework. These are SAP-native extraction tools that require SAP expertise to configure but provide access to the full SAP data model and support CDC (change data capture) for large tables.

For custom SAP objects (Z-tables, custom fields added to standard objects), Fivetran and Airbyte support custom table extraction via configuration. Custom SAP development (ABAP programs writing delta records to extraction tables, then read by the connector) is required in complex cases.

Oracle EBS and Oracle Fusion/Cloud:

Oracle EBS (E-Business Suite) exposes data via Oracle Business Intelligence Publisher (BIP) views and database views. Extraction typically uses the Oracle JDBC connector (Fivetran, Airbyte, or custom) querying the EBS schema via the standard Oracle views rather than base tables — this provides a more stable extraction surface than querying base tables directly.

Oracle Fusion Cloud provides a REST API (OTBI — Oracle Transactional Business Intelligence) and BI Publisher reports for data extraction. For bulk extraction, the Oracle Data Masking and Subsetting Pack and Oracle Data Pump can export specific schemas to S3. Oracle Fusion also provides a direct integration with Oracle Autonomous Data Warehouse (ADW) for analytics — if you are on Oracle infrastructure, this is a lower-effort path.

Microsoft Dynamics 365:

Dynamics 365 exposes data via the Dataverse API (for Customer Engagement modules — Sales, Service, Marketing) and Azure Synapse Link (for Finance and Operations modules). Azure Synapse Link continuously replicates Dynamics 365 F&O data to Azure Data Lake Storage or Synapse — this is Microsoft's recommended extraction mechanism and is the most reliable path for organisations running Dynamics 365 F&O.

Fivetran and Airbyte both offer Dynamics 365 connectors for the Dataverse entities. For organisations not using Azure, the Dataverse Web API provides REST access to all Dynamics entities.

Data Model Design for ERP Analytics

Raw ERP tables are not analytically useful. The joins required to answer a business question — "what was accounts payable outstanding as of month end?" — require joining BSEG, BKPF, LFB1, and several configuration tables, applying fiscal period filters, identifying reversal documents, and computing open item balances. This logic must be encapsulated in analytical models.

**The ERP semantic layer.** Build dbt models that translate ERP tables into business-process-oriented semantic models:

**General Ledger model:** Joins BSEG (line items) + BKPF (header) + SKB1/SKA1 (chart of accounts) + T001 (company codes). Applies posting period logic to correctly attribute transactions to fiscal periods. Handles reversal documents (clearing documents that offset prior postings). Outputs: GL entries with business unit, cost centre, account, posting date, fiscal period, and amount.

**Accounts Payable / Accounts Receivable model:** BSEG filtered for vendor/customer open items, joined to LFA1/KNA1 (vendor/customer master data). Applies clearing logic (an open item is cleared when a payment is posted against it) to identify outstanding balances. Requires understanding SAP's dunning levels and payment terms configuration.

**Procurement model:** EKKO (PO headers) + EKPO (PO line items) + MSEG (goods receipts) + BSEG (invoice postings). Produces a procure-to-pay timeline: PO raised → goods received → invoice received → payment made. Each step has a document date, posting date, and status.

**Revenue/Sales model:** VBAK (sales order headers) + VBAP (sales order line items) + VBRK/VBRP (billing documents) + BSEG (accounting entries). Produces order-to-cash pipeline: order raised → delivery → billing → payment.

These models require deep knowledge of the ERP system's data model. The configuration is implementation-specific — account groups, document types, and posting keys vary by organisation. Do not assume that SAP models built for one client will transfer unchanged to another.

Common Analytical Failures

**Ignoring reversal documents.** SAP reversal documents (identified by BKPF.STBLG and BSEG.BUZEI reversal indicators) offset original postings. A report that counts all GL postings without excluding reversals double-counts reversed transactions. Every GL analytical model must handle reversals correctly.

**Reporting on transaction date instead of posting period.** A financial report for March should include all transactions posted to March fiscal periods — including transactions entered in April that were backdated to March. Reporting on transaction date (BSEG.BUDAT) instead of posting period (BSEG.MONAT + BSEG.GJAHR) produces reports that do not match ERP financial statements.

**Not joining to configuration tables.** ERP amounts are stored as amounts in document currency (BSEG.WRBTR) and local currency (BSEG.DMBTR). The currency conversion rate used is stored in the document header (BKPF). Reporting in a single organisation currency requires joining to the exchange rate table (TCURR) or using the local currency amount. Analysts who report on document currency without currency normalisation produce multi-currency comparisons that are not comparable.

**Ignoring deletion flags.** ERP master data (customer, vendor, material) has a deletion flag that marks records as inactive. Analytical models should filter out deletion-flagged records from active dimension tables.

ERP analytics is one of the highest-value data engineering investments an organisation can make. Financial, procurement, and operational data from ERP systems drives the most important business decisions. But the complexity is real and should not be underestimated — the extraction, modelling, and validation work for a production ERP analytics implementation takes months, not weeks.

Our data architecture consulting practice has designed and implemented ERP analytics integrations for SAP, Oracle, and Microsoft Dynamics — contact us to discuss your ERP data integration requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →