BlogData Engineering

Fivetran Setup and Configuration: Getting Data Moving in Hours, Not Weeks

James Okafor
James Okafor
Lead Data Engineer
·November 17, 202711 min read

Fivetran is a managed data connector service that automates the extraction and loading of data from SaaS applications, databases, and event streams into your cloud data warehouse. This guide covers the practical setup process — connector configuration, warehouse permissions, schema management, and the operational decisions that determine how well Fivetran serves your analytics stack long-term.

Fivetran is a managed ELT service that automates extraction and loading from hundreds of SaaS applications, databases, and event sources into cloud data warehouses. The core value proposition is simple: instead of building and maintaining custom data connectors — which requires engineering time to build, monitor, fix when APIs change, and upgrade when the warehouse schema evolves — you buy a connector from Fivetran and it handles all of that.

What Fivetran Handles

Fivetran manages the parts of data integration that are repetitive, brittle, and undifferentiated:

**API extraction** — authentication, pagination, rate limiting, and retry handling for SaaS API calls. When Salesforce changes their API, Fivetran updates their Salesforce connector; you do not.

**Schema detection and migration** — when a source adds a column, Fivetran detects the new column and adds it to the destination table. When a column is removed, Fivetran logs the removal and the destination schema reflects the change. This automatic schema management eliminates a large class of pipeline failures.

**Incremental loading** — Fivetran tracks what data has been extracted and loads only new or changed records on each sync. For most connectors, this is handled automatically without configuration.

**Normalisation** — Fivetran normalises source data to a consistent schema in the destination. Each source system has a documented destination schema that analytics engineers can rely on without reverse-engineering the raw structure.

Fivetran Setup Process

Setting up a Fivetran connector involves:

**1. Create a connection in the Fivetran dashboard** — select the source connector type, provide authentication credentials (OAuth tokens, API keys, database connection strings), and configure the sync schedule.

**2. Configure warehouse destination** — Fivetran needs a destination: Snowflake, BigQuery, Redshift, Databricks, or another supported warehouse. Configure a dedicated Fivetran destination schema (e.g., fivetran_salesforce) where the connector will write data.

**3. Configure warehouse permissions** — Fivetran requires CREATE TABLE, INSERT, UPDATE, DELETE, and CREATE SCHEMA privileges in the destination database. For Snowflake, create a dedicated FIVETRAN role with these privileges:

CREATE ROLE FIVETRAN;

GRANT USAGE ON DATABASE ANALYTICS TO ROLE FIVETRAN;

GRANT CREATE SCHEMA ON DATABASE ANALYTICS TO ROLE FIVETRAN;

**4. Initial historical sync** — on first setup, Fivetran performs a full historical sync of all data in the connected source. For large sources (Salesforce with millions of records, large databases), this initial sync may take hours to days. Monitor it and ensure the destination warehouse has capacity.

**5. Configure sync frequency** — Fivetran supports sync frequencies from every 5 minutes to daily. Higher sync frequency increases Fivetran consumption (MAR — Monthly Active Rows) and may increase warehouse loading costs. Match sync frequency to actual data freshness requirements.

Connector Types and Their Configurations

Fivetran's connector library spans several categories:

**SaaS application connectors (Salesforce, HubSpot, Marketo, Google Analytics)** — typically configured with OAuth or API key authentication. These connectors pull from REST APIs; sync frequency is limited by API rate limits.

**Database connectors (PostgreSQL, MySQL, SQL Server)** — connect to operational databases via Change Data Capture (CDC) using the database's transaction log, or via incremental column-based sync. CDC is preferred for production databases as it minimises load; incremental sync works for read replicas.

**File connectors (S3, GCS, SFTP)** — ingest files dropped in cloud storage or FTP locations. File format (CSV, JSON, Parquet) and schema must be consistent; Fivetran reads files on schedule.

**Event connectors (Segment, Amplitude, Mixpanel)** — ingest event data from product analytics platforms. Typically high-volume; sync frequency and volume management matter more for cost.

Schema Management Strategy

Fivetran writes to a destination schema that mirrors the source structure. For dbt-based analytics engineering workflows, the standard approach is:

1. Fivetran writes to raw landing schemas: raw.salesforce, raw.stripe, raw.postgresql_ops

2. dbt staging models read from these raw schemas via source() references

3. dbt never writes back to raw schemas

This clean separation means Fivetran and dbt own different layers. Fivetran's schema changes (new columns from the source) are reflected in the raw schema and captured by dbt's source() tests if the new columns are expected. Unexpected schema changes surface as source test failures rather than silent data corruption.

MAR Pricing and Cost Management

Fivetran's pricing is based on Monthly Active Rows (MAR) — the number of rows synced or updated in a month across all connectors. Understanding MAR is critical for cost management:

**High-MAR connectors** — event tables (Segment, analytics events), transaction tables in operational databases with high update rates. These can generate millions of MAR per month.

**Incremental loading optimisation** — for database connectors, CDC generates MAR only for rows that actually change, whereas incremental column-based sync queries for rows with an updated_at above the last sync timestamp. CDC is typically more MAR-efficient for tables with high read vs. write ratios.

**Sync frequency vs. MAR** — higher sync frequency on frequently updated sources generates more MAR because the same rows may be synced multiple times as they are updated between syncs. For sources where data freshness requirements are hourly rather than sub-hourly, reducing sync frequency can meaningfully reduce MAR.

**Historical re-syncs** — avoid triggering full re-syncs on large connectors unnecessarily. A full re-sync counts all rows in the source as MAR, regardless of whether they have changed.

Fivetran Transformations vs dbt

Fivetran offers a built-in transformation layer (Fivetran Transformations) for basic SQL transformations applied after loading. For analytics engineering teams using dbt, this creates an overlapping capability:

The standard recommendation: use dbt for all transformation logic; use Fivetran only for extraction and loading. Mixing transformation logic between Fivetran and dbt creates complexity — transformations in two systems with different testing, documentation, and deployment processes.

The only exception: Fivetran's schema normalisation (basic renaming and type casting applied by the connector itself) is acceptable. Fivetran-defined transformations that go beyond normalisation and encode business logic should be migrated to dbt.

Our data architecture practice designs data ingestion architectures including Fivetran deployment and integration with dbt for enterprise analytics teams — contact us to discuss your data integration stack.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →