BlogTableau

Tableau Data Sources: Live Connections, Extracts, and Published Data Sources Explained

Obed Tsimi
Obed Tsimi
Founder & Senior Tableau Architect
·June 26, 202610 min read

Understanding how Tableau connects to data — live connections vs extracts, published data sources vs embedded connections, and Tableau Bridge for private networks — is essential for building performant, maintainable analytics.

The quick answer

Tableau connects to data through three connection models: live connections (queries run against the source system in real time), extracts (data is imported into Tableau's .hyper format and stored locally or on Tableau Server/Cloud), and published data sources (connections shared centrally on Tableau Server or Cloud, usable by multiple workbooks). Each model has different performance characteristics, refresh requirements, and governance implications. The choice between live and extract is one of the most consequential technical decisions in a Tableau deployment — getting it wrong produces either stale data or overloaded database infrastructure.

Live connections

A live connection queries the underlying database every time a user interacts with the view — filters, drill-downs, and page loads all generate database queries. The Tableau workbook is essentially a visual SQL query builder; all computation happens in the source database.

**When live connections are appropriate**: dashboards that require data current to the minute or second (operational dashboards, real-time KPI monitors); databases with fast, optimised query infrastructure (Snowflake, BigQuery, Redshift) that can handle concurrent Tableau queries without performance degradation; use cases where data currency is more important than response time.

**When live connections cause problems**: slow databases that cannot handle concurrent analytical queries; dashboards with many users or scheduled subscriptions (each subscription render generates a live query); complex workbooks with many marks and filter interactions (each interaction triggers a query round trip, adding latency).

**Optimising live connections**: use database-level optimisations (indexes, materialised views, clustering) to ensure Tableau's generated SQL runs efficiently. Use Tableau's Initial SQL to set session-level parameters. Use context filters to reduce the dataset before other filters are evaluated. Monitor query performance in the database's query log when live dashboards are slow — the Tableau-generated SQL is often verbose and benefits from database-side materialisation.

Extracts

An extract imports data from the source system into Tableau's optimised columnar format (.hyper files). Once extracted, queries run against the local extract rather than the source system — typically 10–100x faster than live connections for most analytical queries.

**Full extracts**: replaces the entire dataset on each refresh. Appropriate for: data sources where the full dataset is small enough to extract quickly (under a few hundred million rows in most cases); sources that do not have reliable timestamps for incremental extraction; or when full data accuracy is required and incremental logic is complex.

**Incremental extracts**: appends only new rows since the last extract, based on a timestamp or sequential ID column. Dramatically faster for large sources — a 1-billion-row fact table refreshed incrementally processes only the last day's rows rather than the full table. Requires a reliable, monotonically increasing column (created_at, updated_at, sequence ID) to determine which rows are new.

**Extract refresh scheduling**: on Tableau Server or Tableau Cloud, extract refreshes are scheduled jobs — the Tableau backgrounder process runs the extraction on schedule and updates the .hyper file. Schedule refreshes when source database load is lowest. For critical dashboards, monitor backgrounder performance and alerting for failed refreshes.

**Extract storage**: .hyper files are stored on Tableau Server (in the Server's configured data directory) or on Tableau Cloud's managed storage. Large extracts consume significant disk space — a 100M-row extract compressed in .hyper format is typically 10–50GB depending on column count and cardinality. Monitor disk usage and implement retention policies.

Published data sources

A published data source is a connection (live or extract) published to Tableau Server or Tableau Cloud as a standalone asset — separate from any specific workbook. Other workbooks can connect to a published data source, inheriting its connection settings, field definitions, calculated fields, and extract schedule.

**Why published data sources matter for governance**: without published data sources, each workbook embeds its own connection — potentially with different SQL, different calculated field definitions, and different refresh schedules. If the underlying database changes (a column renames, a table moves), every workbook with an embedded connection must be updated individually. With a published data source, one update propagates to all connected workbooks.

**Certified data sources**: Tableau Server and Cloud allow administrators to mark data sources as "Certified" — a visual indicator that the data source has been reviewed, validated, and meets governance standards. Certified data sources are surfaced prominently in Tableau's search interface. Certification programmes drive adoption of governed data sources over ad-hoc workbook connections.

**Row-level security in published data sources**: implement row-level security (user filters) in published data sources rather than in individual workbooks. A published data source with a user filter based on [UserName] or group membership enforces consistent access control across all workbooks connected to that source. This is the scalable approach — not duplicating filter logic in every workbook.

**Published data source governance best practices**: restrict who can publish data sources (Creator licence required, but further access controls can limit publishing to a governance group); require documentation before certification; periodically audit data source usage (Tableau REST API and Admin Insights can show which data sources are connected to how many workbooks and how recently they were queried).

Tableau Bridge

Tableau Bridge is a client application that allows Tableau Cloud to reach data sources that are not directly accessible from the internet — on-premise databases, databases in private VPCs, or any system behind a firewall.

Bridge runs on a Windows or macOS machine inside the private network. When a Tableau Cloud workbook refreshes an extract or executes a live query against a private data source, the request routes through Bridge (over an outbound HTTPS connection to Tableau Cloud), Bridge executes the query against the local network, and returns the result.

**Bridge pools**: for high-volume extract refresh environments, multiple Bridge agents can be configured as a pool — Tableau Cloud distributes extract refresh jobs across available Bridge agents, providing load balancing and redundancy.

**Live query via Bridge**: Bridge supports live connections for some data sources (ODBC-based connections). Live query through Bridge adds latency (the round trip from the cloud through Bridge to the database and back) and limits concurrency compared to direct cloud connections. For live connection use cases, prefer migrating the data source to a cloud-accessible endpoint (Snowflake, BigQuery, RDS with public access) over relying on Bridge for live queries.

Virtual connections

Tableau Virtual Connections (available on Tableau Cloud and Tableau Server with Data Management licence) provide a centralised, governable abstraction layer between Tableau and the underlying data. A virtual connection defines: the database connection credentials, the tables available, joins between tables, and row-level security policies.

Workbooks and published data sources connect to the virtual connection rather than directly to the database — credentials and connection logic are centralised in the virtual connection, and all downstream assets inherit its row-level security.

Virtual connections are the most governance-advanced data connectivity option in Tableau, but they require the Data Management add-on licence.

Connection type decision framework

For a new Tableau data source: if the source database is cloud-hosted (Snowflake, BigQuery, Redshift, Azure SQL Database) and the team has optimised it for analytical queries, start with a live connection. If live query performance is insufficient (dashboard load > 5 seconds, high database load), move to an extract with a refresh schedule appropriate for the data's staleness tolerance. If the data source will be shared across multiple workbooks, publish it to Tableau Server or Cloud.

For on-premise sources: extracts are almost always preferable to live connections for cloud-hosted Tableau environments — live connections through Tableau Bridge for on-premise databases add latency and operational complexity that extracts avoid.

For the broader Tableau Server environment context, see tableau server admin guide. For the Tableau Cloud migration context where Bridge becomes relevant, see tableau server to cloud migration. For performance optimisation of Tableau workbooks, see tableau calculated fields.

Our Tableau consulting practice designs and optimises data connectivity architectures for Tableau environments — extract strategies, published data source governance, row-level security, and Bridge configuration. Book a free 30-minute audit if your Tableau environment has performance or governance issues.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →