BlogTableau

What Is a Tableau Extract? How .hyper Files Speed Up Dashboard Performance

Obed Tsimi
Obed Tsimi
Founder & Lead Tableau Architect
·June 19, 202810 min read

Tableau extracts are local copies of your data stored in the columnar Hyper format — the primary mechanism for making dashboards fast and independent of source system availability. This guide explains how extracts work, when to use them over live connections, and how to manage extract refresh schedules.

Tableau extracts are snapshots of your data stored in Tableau's proprietary Hyper columnar format. When you create an extract, Tableau queries your source system once, copies the resulting data into a .hyper file, and stores it locally or on Tableau Server or Tableau Cloud. Subsequent queries against the extract hit the Hyper engine directly — not your source database.

This distinction matters enormously for performance. An extract of a 50-million-row sales table queries in milliseconds because Hyper's columnar format is purpose-built for analytical aggregations. The same query against a live PostgreSQL or Oracle database takes seconds or minutes depending on indexing, concurrent load, and network latency. For most BI workloads — dashboards that refresh data on a schedule rather than requiring real-time sub-second freshness — extracts deliver dramatically better user experience.

The Hyper Engine

Tableau's Hyper in-memory analytical processing technology is the engine behind extracts. Hyper uses a columnar storage format — data stored column-by-column rather than row-by-row — which is optimal for analytical queries that aggregate values within columns across many rows. A query summing revenue across 100 million rows reads only the revenue column from disk; row-based storage would require reading every column of every row.

Hyper files (.hyper extension) are self-contained. A workbook using an extract carries its data with it — you can email a workbook to someone and they can open it and interact with it without any database connection. This portability makes extracts useful for distributing dashboards outside of connected environments.

Extract vs Live Connection

Use a live connection when:

- Your data changes continuously and dashboards must reflect the current state — inventory levels, active incidents, real-time sales floors

- The source system is fast enough to support interactive query latency (sub-2-second response for filter changes)

- Regulatory or security requirements prohibit copying data outside the source system

- Data volume is small enough that live query performance is acceptable

Use an extract when:

- Your data changes on a schedule (daily, hourly) and live freshness is not required

- Source system query performance is too slow for interactive dashboard use

- The source system has limited concurrent connections and BI load would impact production workloads

- You need to blend data from multiple sources that cannot be joined in a live connection

- Users access dashboards from locations with unreliable connectivity to the source system

For most enterprise BI workloads — sales performance, operational metrics, financial reporting, customer analytics — scheduled extract refresh is the right pattern. The data is fresh enough for the decisions being made; the performance benefit is substantial.

Extract Refresh Types

**Full refresh:** Tableau drops the existing extract and rebuilds it entirely from the source query. Every extract refresh is a full refresh unless you configure incremental refresh. Full refresh is simple, reliable, and correct — the extract matches the source query exactly. For most extract sizes up to several hundred million rows, full refresh is appropriate.

**Incremental refresh:** Tableau queries only rows added since the last refresh, appending them to the existing extract. Incremental refresh requires a reliable incremental identifier — a timestamp or auto-incrementing ID — that allows Tableau to query only new rows. Incremental refresh is appropriate when:

- The source table is very large and full refresh takes longer than the refresh window allows

- The source system is rate-limited and full table scans are expensive

- You can trust the incremental identifier (append-only tables where records are never updated)

Incremental refresh does not handle updates or deletes in the source. If source records are modified after initial extract, incremental refresh will not capture those changes — only new rows are appended. For slowly changing dimension data or tables with updates, full refresh is required.

Refresh Scheduling

On Tableau Server and Tableau Cloud, extract refresh schedules are configured at the data source level. Schedules specify frequency (hourly, daily, weekly), time of day, and — for Tableau Cloud — timezone.

Scheduling considerations:

- Schedule refreshes during off-peak hours when Backgrounder process capacity is available and user impact of a brief data source refresh window is minimal

- Stagger refreshes when multiple extracts feed the same workbook — refresh upstream data sources before downstream ones

- Set alerts for extract failures — a failed refresh means dashboards are showing stale data without users knowing

- Monitor refresh duration — extracts that are growing should be reviewed before they exceed the refresh window

For Tableau Server, the number of Backgrounder processes determines extract refresh throughput. Sites with many scheduled extracts require sufficient Backgrounder capacity to avoid queues where extracts wait hours past their scheduled time.

Extract Optimization

**Filter at source:** Apply extract filters to exclude rows and columns not needed for analysis. An extract of a transaction table filtered to the current rolling 3 years with only the columns used in dashboards is a fraction of the size of a full table extract — smaller file, faster refresh, faster queries.

**Aggregate extract:** For dashboards that never drill to row-level detail, pre-aggregate the extract to the grain used in analysis. Tableau's hide all rows option creates an aggregated extract that is dramatically smaller than the row-level equivalent and queries even faster.

**Logical tables vs physical tables in the Hyper file:** When an extract contains multiple tables joined in the logical layer, Tableau stores them as separate physical tables in the Hyper file and joins at query time. Merging tables before extraction (using custom SQL or a prepared view) can improve performance when the same join is executed repeatedly.

**Review extract size trends:** Extracts grow as source data grows. Extract sizes approaching several hundred gigabytes should be reviewed — whether filtering can reduce them, whether incremental refresh should be configured, or whether the data architecture feeding the extract should be restructured.

Our Tableau consulting and managed BI services practices cover extract architecture, refresh scheduling, and performance optimization for Tableau environments of all sizes. Contact us to discuss your Tableau extract requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →