How Fivetran extracts, loads, and normalises data from source systems — the connector architecture, the normalised schema pattern, how incremental syncs work, the log-based CDC approach for database connectors, and the configuration decisions that determine cost, reliability, and schema compatibility.
Fivetran is the dominant managed data connector service. Understanding how it works internally — not just how to configure it — determines whether you use it correctly, debug connector issues efficiently, and make informed decisions about schema design, sync frequency, and cost management.
How Fivetran Works
Fivetran's core job is to move data from source systems into your data warehouse reliably and with minimal configuration. It does this by:
1. Connecting to the source system using an appropriate extraction method
2. Extracting data incrementally (new or changed records since the last sync)
3. Normalising the data into a defined schema structure
4. Loading the data into the target warehouse in the raw landing schema
The extraction method varies by connector type and determines what kinds of changes Fivetran can detect.
Extraction Methods
### API-Based Extraction
For SaaS applications (Salesforce, HubSpot, Stripe, Google Analytics, Zendesk), Fivetran uses the application's REST or GraphQL API. The API returns records that have been created or modified since the last sync, typically by filtering on a cursor field (updated_at, modified_date, or an API-specific cursor parameter).
Limitations of API extraction:
- API rate limits constrain how fast Fivetran can extract data. High-volume sources can hit rate limits, causing syncs to slow or fail.
- APIs do not always expose deleted records. If a record is deleted in the source system, Fivetran may not know it was deleted — it simply stops appearing in API responses. Fivetran handles this with soft-delete tracking where possible: if a record that previously synced stops appearing, Fivetran adds an _fivetran_deleted = true flag.
- API structure changes. When an API version is deprecated or a field is added/removed, Fivetran updates its connector to handle the change, but there may be gaps during connector updates.
### Database Replication (CDC)
For relational database sources (PostgreSQL, MySQL, SQL Server, Oracle), Fivetran uses log-based Change Data Capture (CDC) where available. CDC reads the database's transaction log directly, capturing every insert, update, and delete as it occurs.
Advantages of CDC:
- Captures true deletes (not just soft deletes)
- Captures row updates precisely as they happen
- Does not add query load to the source database (reads the transaction log, not the tables)
- Can be near-real-time for low-latency requirements
Requirements for CDC:
- PostgreSQL: requires logical replication to be enabled and a replication slot configured
- MySQL: requires binary logging enabled in row format
- SQL Server: requires SQL Server replication licence and appropriate permissions
Not all database hosting environments allow CDC configuration (some managed RDS configurations restrict replication slot creation). When CDC is not available, Fivetran falls back to high-watermark polling: querying for records where the updated_at timestamp is greater than the last sync time.
**The CDC replication slot problem:** For PostgreSQL, CDC requires maintaining a replication slot. If the Fivetran consumer falls behind (due to sync pause or network issues), the replication slot accumulates WAL (write-ahead log) data that the database cannot purge. This can consume disk space rapidly on high-write databases. Monitor replication slot lag and set a maximum retention to prevent disk exhaustion.
Fivetran's Normalised Schema
Fivetran loads data into a normalised schema that mirrors the source system's structure. Each source table becomes a table in the destination schema. Fivetran adds system columns to every table:
**_fivetran_synced:** A timestamp indicating when the row was last processed by Fivetran. This is the column used for source freshness checking in dbt (loaded_at_field: _fivetran_synced).
**_fivetran_deleted:** A boolean flag set to true for rows that have been deleted from the source. Present for tables where Fivetran can detect deletions.
**_fivetran_id:** A stable row identifier generated by Fivetran, present for tables without a natural primary key.
The normalised schema loads data as-is from the source. Business logic, joining, and transformation are not Fivetran's job — that is dbt's layer. Fivetran's job is to make source data available reliably; dbt's job is to make it analytically useful.
Schema Evolution
Source schemas change over time. When a new column is added to a Salesforce object, it needs to appear in the Fivetran-synced table. Fivetran handles schema evolution automatically for most cases:
**New columns added to source:** Fivetran adds the new column to the destination table on the next sync. dbt models referencing the table may not automatically include the new column (if they use SELECT star, they will; if they specify columns, the new column is simply not yet in the model). This is a reason to track schema changes in your monitoring — new source columns may need to be evaluated for analytical relevance.
**Column type changes:** Type changes in the source (a VARCHAR becoming INT) can cause Fivetran to change the column type in the destination. This can break downstream dbt models that assumed the original type. Fivetran typically widens types rather than narrowing — adding precision rather than reducing it — but breaking type changes do occur.
**Table deletions:** If a source table is removed (the API no longer exposes it, or a database table is dropped), Fivetran stops syncing it but does not delete the destination table. The destination table remains with the last-synced data. This is intentional — Fivetran does not delete destination data to avoid data loss.
Sync Frequency and Cost
Fivetran charges based on Monthly Active Rows (MAR) — the number of distinct rows synced in a month, regardless of how many times they are synced. This pricing model has implications for sync frequency configuration:
**For tables with low update frequency:** Reducing sync frequency from hourly to daily for tables that update once per day does not change MAR (the same rows are synced whether you check hourly or daily) but may reduce Fivetran's API call volume and source system load.
**For tables with high row counts and frequent updates:** MAR scales with the number of rows that change. A 10 million row table where 10% of rows update daily contributes 1 million MAR per day. Understanding your MAR drivers helps anticipate cost growth.
**Connector priority:** Fivetran allows prioritising high-priority connectors to ensure they sync before lower-priority ones when scheduler capacity is constrained. Critical operational connectors should be configured as higher priority than analytical-only connectors.
Fivetran Transformations (dbt Integration)
Fivetran offers native dbt integration — you can configure dbt Cloud jobs to run automatically after specific connectors complete. The sequence: Fivetran connector syncs → dbt job triggers → analytical models refresh.
This integration reduces the need for a separate orchestration layer (Airflow/Prefect) for simple pipelines where the only trigger for dbt runs is Fivetran sync completion.
For more complex pipelines with multiple connectors, cross-connector dependencies, or custom triggering logic, a dedicated orchestration layer is still necessary.
Alternatives to Fivetran
**Airbyte:** Open-source alternative. Similar connector breadth, lower cost (free if self-hosted), more configuration overhead. Airbyte Cloud (managed) offers a similar experience to Fivetran at lower cost.
**Stitch:** Mid-market alternative with fewer connectors and lower pricing. Appropriate for smaller organisations with standard connector needs.
**Singer:** Open-source framework for building custom connectors. Not a managed service — requires engineering effort to build, maintain, and operate connectors. Appropriate for sources without managed connector support.
**Custom ETL:** Building and maintaining custom ingestion pipelines. Higher engineering cost, maximum flexibility. Appropriate for sources with unusual extraction requirements or for organisations that want to avoid the MAR-based pricing of managed connectors.
Our data engineering consulting practice designs ingestion architectures including Fivetran deployment and optimisation — contact us to discuss data ingestion for your environment.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →