Tableau Extract Optimisation: How to Make Extracts Faster and Smaller

Tableau extracts that are too large to refresh quickly, too slow to query, or too expensive to store can be fixed with the right combination of extract filtering, data source optimisation, extract scheduling, and incremental refresh strategies. This guide covers every technique for optimising Tableau extract performance.

Tableau extracts that are slow to refresh, slow to query, or prohibitively large to store are almost always fixable. The fixes fall into three categories: reducing the amount of data in the extract, restructuring the data before extraction, and optimising the extract engine settings. This guide covers all three.

Why Extracts Become Problems

Extracts are created to solve a performance or availability problem — live connections to large databases are slow, or the database cannot handle BI query load. But extracts create their own performance challenges as they grow.

The most common extract problems:

- Full refresh extracts that take longer than the refresh window (a 4-hour refresh on an hourly schedule)

- Extracts so large that queries against them are slower than the live connection they replaced

- Storage costs that make the extract uneconomical (100GB extracts in Tableau Cloud)

- Incremental extracts that accumulate uncommitted transactions or duplicate rows

Understanding which problem you have determines which solution applies.

Reduce Extract Size First

The most impactful optimisation is usually the simplest: do not extract data you do not need.

### Date Range Filtering

Most dashboards use data from a rolling window — the last 1 year, the last 3 years, the last 5 years. An extract that contains 10 years of history to support a dashboard that only shows 2 years is carrying 5x the necessary data.

Add a date range filter to the extract definition. For extracts refreshed on a rolling window, use a relative date filter (e.g., only rows where date >= DATEADD('year', -3, TODAY())) so the extract automatically includes new data and drops old data on each refresh.

**Important:** In Tableau, extract filters are set differently from view filters. Extract filters are applied at extract creation time, not at query time. If the filter is wrong, you need to recreate the extract — you cannot fix it by changing a view filter.

### Column Selection (Hide Unused Fields)

Tableau extracts store all fields from the data source by default. Unused fields consume storage without providing analytical value.

In the extract creation dialog, you can hide specific fields before creating the extract. Hidden fields are not included in the extract. For a table with 150 columns where only 30 are used in any published workbook, hiding the other 120 reduces extract size by 80%.

Before hiding fields: ensure no published workbook or published data source uses the fields you are about to hide. Use Tableau Server's "Used Fields" analysis (available via the REST API or admin views) to identify fields with zero usage.

### Row-Level Filtering

Apply extract-level filters to exclude rows that will never be used analytically:

- Test records (WHERE is_test_record = false)

- Internal company users in product analytics (WHERE company_id != 1)

- Soft-deleted records that are never displayed (WHERE is_deleted = false)

- Non-current records in slowly changing tables (WHERE is_current = true)

Each filter reduces extract size and reduces the data the extract engine must process on every query.

### Pre-Aggregate in the Data Source

If the extract connects to a raw fact table but the dashboard only needs daily aggregates, the extract contains row-level data that is never needed. Pre-aggregate in a view or dbt model before Tableau extracts the data.

A clickstream events table with 1 billion rows becomes a daily_session_summary table with 1 million rows when pre-aggregated at the dbt layer. The extract of the pre-aggregated table is 1000x smaller and refreshes 1000x faster.

This is the single most impactful optimisation for large extracts — and it requires changing the data model upstream, not just extract settings.

Incremental Refresh

Full refreshes rebuild the entire extract from the source on every run. For large tables, this means transferring the full dataset on every refresh cycle. Incremental refresh processes only new rows since the last refresh — dramatically faster for tables where new data arrives but old data does not change.

### How Incremental Refresh Works

Incremental refresh requires a column that identifies new records — typically a datetime column like created_at, updated_at, or event_timestamp. Tableau appends rows where the column value is greater than the maximum value in the existing extract.

To enable incremental refresh: in the extract creation dialog, select the incremental refresh option and specify the column and refresh type.

**Critical limitation:** Incremental refresh in Tableau only appends new rows. It does not update or delete existing rows. If rows in the source can be updated after creation (order status changes, customer attribute updates), incremental refresh will produce incorrect data — the extract will contain the old values indefinitely.

Incremental refresh is correct for:

- Immutable event tables (clickstream events, transaction logs, system events)

- Append-only time series data

Incremental refresh is incorrect for:

- Tables where rows are updated after creation (CRM records, subscription tables, order tables)

For tables that require both new rows and updated rows, consider:

- Connecting to a pre-processed dbt model that uses snapshot logic to track current state, then using full refresh

- Using a delete-and-insert incremental approach at the dbt layer before Tableau extracts

### Incremental Refresh Accumulation

Incremental extracts that have been running for extended periods can accumulate fragmentation. When Tableau appends rows to an incremental extract, the underlying extract file becomes fragmented — rows are not organised in optimal sort order, and the file contains internal overhead from repeated append operations.

Periodically run a full refresh to rebuild the extract from scratch, cleaning up the fragmentation. The Tableau Server admin can trigger a full refresh on a specific data source via the REST API or the server admin UI. Scheduling a monthly full refresh for incrementally-updated extracts maintains extract health.

Extract Scheduling Strategy

Extract refresh scheduling determines when refreshes run relative to when users access dashboards.

**Stagger refreshes to avoid contention:** If all extracts are scheduled to refresh at 6am, they queue on the backgrounder processes simultaneously. The last extract in the queue finishes hours after the intended refresh time. Distribute refresh schedules across the overnight window.

**Buffer before peak usage:** Schedule extracts to complete before users arrive. If the business day starts at 8am, complete all critical extract refreshes by 7am with enough buffer to handle a failed first attempt and a retry.

**Failure notification:** Configure email or Slack notifications for failed extract refreshes. Silent failures — extracts that fail and serve stale data without anyone being notified — are the most damaging failure mode. By default, Tableau Server notifies the extract owner; ensure the extract owner is an active monitored inbox, not a former employee's address.

Extract Engine Optimisation

### Materialise Calculations

Tableau's extract engine evaluates calculated fields at query time by default. Complex calculations that are evaluated on every query slow down dashboard load. For calculations that do not change per-query (fixed business logic, date formatting, string transformations), materialise the calculation in the extract.

In the extract creation dialog, select "Materialise Calculations" to pre-compute and store the calculated field values in the extract. Queries then read pre-computed values rather than evaluating the calculation.

Only materialise calculations that are stable — calculations that reference dynamic inputs (TODAY(), NOW(), user functions) cannot be materialised.

### Extract Format

Tableau extracts are stored in the .hyper format (Tableau 10.5+). Older .tde extracts should be upgraded to .hyper, which provides better compression, faster refresh, and improved query performance.

If you have legacy .tde extracts, recreate them as .hyper extracts by doing a full refresh via the Tableau Server admin.

When to Replace an Extract with a Different Architecture

Some extract performance problems are symptoms of an architecture that needs to change, not just an extract that needs to be configured differently.

**A 100GB+ extract that takes 8 hours to refresh** is a signal that the pre-processing should move upstream. Connect Tableau to a pre-aggregated dbt mart table rather than a raw 100GB fact table. The mart table is small; the extract is small; the refresh is fast.

**An extract connecting to a live database that is too slow** suggests the database needs optimisation or the pre-aggregation should happen before Tableau. Tableau extracts are designed to solve query performance problems, not data model problems.

**Multiple extracts connecting to the same underlying tables** suggests a shared certified data source architecture would be more efficient — one extract, multiple workbooks connecting to it, rather than each workbook maintaining its own extract of the same underlying data.

Our Tableau consulting practice optimises Tableau extract architectures for performance and scalability — contact us to discuss extract performance for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →