BlogBusiness Intelligence

Tableau Extracts vs Live Connections: Making the Right Choice for Each Use Case

Obed Tsimi
Obed Tsimi
Founder & Senior Tableau Architect
·May 16, 202710 min read

The choice between a Tableau extract and a live data source connection is one of the most impactful performance decisions in Tableau architecture. It determines data freshness, query response time, infrastructure load, and extract storage costs. This guide covers when each option is appropriate and how to optimise whichever you choose.

The choice between a Tableau extract and a live data source connection is one of the most consequential configuration decisions in a Tableau deployment. It affects dashboard load time, data freshness, server infrastructure load, and the operational complexity of the analytics environment. Most deployments use a mix of both, with the choice for each data source driven by the specific requirements of the dashboards it serves.

What Extracts Are and How They Work

A Tableau extract is a snapshot of data from a source, stored in Tableau's proprietary columnar format (.hyper). When a dashboard uses an extract, Tableau reads from the local .hyper file rather than querying the source database. The extract is stored on Tableau Server or Tableau Cloud and refreshed on a defined schedule.

The .hyper format is a columnar in-memory database optimised for the kinds of queries Tableau generates — aggregations, filters, and group-by operations on narrow column subsets of wide tables. For most analytical query patterns, it is 10–50x faster than an equivalent query against a general-purpose relational database.

Extracts are not a copy of the entire source table. They can be filtered (extract only the last 2 years of data), aggregated (extract pre-aggregated monthly totals rather than individual transactions), and calculated (materialise commonly used calculated fields as extract columns). These optimisations can significantly reduce extract size and improve query speed.

What Live Connections Are and How They Work

A live connection sends queries from Tableau directly to the source database at dashboard load time. Every time a user opens a dashboard, Tableau generates SQL and sends it to the database. The database executes the query and returns results. Tableau renders the view from those results.

Live connections reflect current data — the moment a record is inserted or updated in the source database, it is visible in Tableau when the next query runs. This makes live connections appropriate for operational dashboards where data currency is critical.

The performance of live connections depends on the source database's query speed for the specific queries Tableau generates. For fast analytical databases (Snowflake, BigQuery, Redshift) with appropriate indexes or partitioning, live connections are often fast enough that extracts provide minimal additional benefit. For slower source databases, live connections can be significantly slower.

When to Use Extracts

**When source database query latency is high.** If queries against the source database take 10–30 seconds, users will not tolerate dashboard load times at that latency. An extract shifts the query from interactive time to scheduled refresh time, delivering consistent sub-second response on the dashboard.

**When the source database cannot handle analytical query load.** Analytical queries from multiple concurrent Tableau users can impose significant load on operational databases. An extract absorbs that load — all queries run against the Tableau Server's local .hyper file, not the source database.

**When calculated fields are expensive to compute at query time.** Calculations like FIXED LOD expressions generate complex SQL that can be slow on large tables. Materialising these calculations in the extract (using Materialise Calculations) shifts computation to extract refresh time.

**When data freshness requirements allow refresh lag.** Most analytical use cases can accept data that is 15 minutes to 24 hours old. Extracts that refresh hourly or daily meet these requirements while providing significant performance improvement over live queries.

**When users need offline access or the source is intermittently available.** Extract data is available even when the source database is unreachable — scheduled maintenance windows, network issues, or connection failures do not affect a user's ability to load an extract-based dashboard.

When to Use Live Connections

**When near-real-time data currency is required.** Operational dashboards that monitor live business processes — current inventory levels, open support ticket queue, active transaction status — need data that reflects the last few minutes, not the last refresh cycle. Live connections on fast analytical sources (Snowflake, BigQuery) often achieve acceptable latency for these use cases.

**When the source is a fast analytical database already optimised for the query patterns.** For Snowflake, BigQuery, or Redshift with appropriate partitioning and clustering, Tableau queries often complete in 1–5 seconds without extracts. The performance benefit of an extract is marginal, and the operational overhead of managing extract refresh schedules adds complexity without proportional value.

**When extract size would be prohibitively large.** Extracts that grow to tens of gigabytes are expensive in Tableau Server storage and slow to refresh. For very large datasets, live connections that use the source warehouse's compute (which scales independently) may be operationally simpler than managing large extracts.

**When source data changes frequently and dashboards need to reflect those changes.** An hourly refresh cycle on an extract means dashboard data is up to an hour old. For use cases where changes happen frequently and those changes affect decisions made on the dashboard, live connections are more appropriate.

Hybrid Approach: Published Data Sources with Extracts

The most common production architecture uses published data sources — extracts published to Tableau Server or Tableau Cloud that multiple workbooks connect to — rather than embedding extract definitions in individual workbooks.

The published data source model provides:

**Consistent refresh schedules**: all dashboards connecting to the published data source receive the same data at the same time. A dashboard that connects to its own embedded extract may refresh on a different schedule than another dashboard — and the two dashboards will show different data to users who compare them.

**Centralised extract management**: extract size, refresh schedule, and optimisation are managed in one place rather than across dozens of workbooks. When the extract filter needs to be updated (extend the date range from 2 years to 3 years), it is changed in one published data source rather than in every workbook that embeds an extract.

**Storage efficiency**: multiple workbooks connecting to the same published extract share one copy of the extract file. The same workbooks each embedding their own extract would create multiple copies of the same data on the server.

Extract Performance Optimisation

For extracts that are large or slow to query:

**Filter the extract**: extract only the rows required for the dashboards using it. If dashboards show only the last 24 months of data, filter the extract to 24 months. This is often the single largest driver of extract performance improvement.

**Aggregate the extract**: if dashboards display only aggregated data (monthly revenue by region by product), extract pre-aggregated data rather than transaction-level data. The extract will be 100x smaller and queries will be proportionally faster.

**Materialise calculated fields**: frequently used LOD expressions and complex calculations, when materialised as extract columns, are answered by a column read rather than a calculation. For calculations that are used in multiple sheets on a dashboard, materialisation provides significant speedup.

**Incremental refresh**: for large extracts where a full refresh is slow, incremental refresh adds only new or modified rows rather than rebuilding the extract. Requires a reliable datetime column that captures all changes. Note: incremental refresh does not capture updates to historical rows unless the update also changes the datetime column.

Our managed BI services and Tableau consulting practice manages extract architecture for enterprise Tableau environments — contact us to discuss extract design and optimisation for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →