Tableau Data Source Management: Published Sources, Extracts, and Governance

Tableau data sources are the foundation of the analytics stack. Published data sources centralise data connections and business logic; extract scheduling determines data freshness; access controls determine who can use each source. Managing these correctly is the difference between an analytics environment that stays coherent at scale and one that fragments into competing data definitions.

Tableau data sources are the foundation of the analytics stack. Published data sources centralise data connections and business logic; extract scheduling determines data freshness; access controls determine who can use each source. Managing these well is the difference between an analytics environment where metric definitions are consistent and data is reliably fresh, and one where every team has built its own connection to the underlying data, applied its own transformations, and produced its own version of each metric.

Published Data Sources vs. Embedded Connections

Tableau allows two approaches to data connectivity: published data sources (shared, governed connections on Tableau Server or Cloud) and embedded connections (connections built into individual workbooks, not shared).

**Published data sources** are the governance baseline. When a data source is published to Tableau Server or Cloud:

- Multiple workbooks can connect to the same data source, all using the same connection parameters, credentials, and (optionally) the same calculated fields and business logic

- Refreshes happen once for all connected workbooks, rather than once per workbook

- The data source can be certified, data quality warnings can be applied, and access can be controlled at the data source level

- Changes to the underlying data model (field additions, calculated field changes) propagate to all connected workbooks

**Embedded connections** are appropriate for personal exploration and workbooks that will never be shared. They are not appropriate for content that will be used by multiple teams or that needs to be maintained over time. Organisations with many embedded connections accumulate technical debt: when credentials change, every workbook with an embedded connection to the affected system must be updated individually. When a business logic change is required, every workbook must be updated independently.

The governance standard should be: all content published to shared projects must connect to a published data source, not an embedded connection.

Extract Architecture

Extracts are local copies of data that Tableau generates from a live connection and stores in its proprietary .hyper file format. Extracts have two purposes: performance (the Hyper engine is extremely fast for analytical queries) and availability (extracts enable dashboards to load when the source database is unavailable or slow).

Extract architecture decisions:

**Extract granularity** — extracts can be full refreshes (rebuild the entire extract from scratch) or incremental refreshes (append only new records). Full refreshes are simpler to manage and guarantee correctness; incremental refreshes are faster and reduce load on the source database. Use incremental refreshes only for sources where the volume is high enough that full refreshes take too long, and only where the data is genuinely append-only (no updates to historical records).

**Refresh schedule** — how frequently the extract is rebuilt determines data freshness. Business decisions about extract schedules should be based on: how often the underlying data changes, how fresh the data needs to be for the decisions the dashboard supports, and the load that refreshes place on the source system. Scheduling all extracts at the same time (often a default configuration) creates a refresh peak that overloads both the source systems and the Tableau backgrounder process.

**Extract size management** — extracts can grow very large if not managed. Incremental refreshes accumulate data indefinitely unless explicitly bounded. Hidden fields (fields present in the extract but hidden in the data source) occupy extract storage. Periodic review of extract sizes — available from Tableau Server's admin views and the REST API — identifies extracts that are unnecessarily large.

**Backgrounder monitoring** — Tableau Server's backgrounder process handles extract refreshes, subscription delivery, and other background tasks. Backgrounder queue length and failure rate are the primary operational health metrics for Tableau Server. Refreshes that are queued for hours are not delivering the data freshness the schedule promises; failures that are not alerting are delivering stale data silently.

Business Logic in Data Sources

Published data sources can contain calculated fields that apply business logic in the data source layer, available to all workbooks that connect to the source. Centralising business logic here has significant governance advantages: the revenue calculation is defined once, and every workbook uses it.

The design decision is where business logic belongs:

**Calculations that define key metrics** (revenue, margin, churn rate, conversion rate) should be in the published data source. These are the calculations that, if defined differently in different workbooks, create metric inconsistency.

**Calculations that are view-specific** (a specific segmentation for a specific dashboard, a calculated field that supports a specific visualisation technique) belong in the workbook. Putting these in the data source clutters the shared layer with workbook-specific logic.

**Calculations that require significant compute** — row-level calculations applied to millions of rows — should ideally be pushed upstream to the data warehouse layer (via dbt or equivalent). Computed in the data warehouse once and stored as a column, they are faster than computing at extract time or query time.

Access Control for Data Sources

Published data sources have their own permission hierarchy, separate from the workbooks that use them. This allows data source access to be governed independently from content access: a user might have viewer access to a workbook but not have direct access to the underlying data source (preventing them from building their own content from the same data).

Data source permissions to configure:

**View** — the user can see the data source exists and see its schema.

**Connect** — the user can use the data source to build visualisations in Tableau Desktop or Tableau Web Authoring.

**Download/Save a local copy** — the user can download the data source as a .tdsx file, enabling export of the underlying data. This permission should be restricted to authorised users.

**Edit** — the user can modify the data source definition, including adding and modifying calculated fields. This should be restricted to the data source owner and the data team.

Row-level security at the data source level — restricting which rows each user can see — is implemented via user filters or data source filters that reference the USERNAME() function. Row-level security defined in the published data source applies consistently to all workbooks connected to that source, without requiring each workbook developer to re-implement it.

Our Tableau consulting practice designs data source architecture and governance frameworks for organisations managing Tableau at scale — contact us to discuss data source management for your Tableau environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →