BlogTableau

Tableau Hyper API: Programmatic Extract Creation and Manipulation

James Okafor
James Okafor
Lead Data Engineer
·October 13, 202711 min read

The Tableau Hyper API allows developers to create, read, update, and delete Tableau extract (.hyper) files programmatically — without a Tableau Server or Desktop installation. For data engineering teams that need to generate Tableau extracts as part of a data pipeline, or that need to manipulate existing extracts without triggering a full refresh, the Hyper API is the technical foundation.

The Tableau Hyper API is a Python, C++, Java, and .NET library that allows developers to interact with Tableau's .hyper extract format programmatically — creating new extracts from data sources, reading existing extract data, inserting rows into extracts, and updating or deleting rows. It operates independently of Tableau Desktop, Tableau Server, or Tableau Cloud: a Hyper file is a self-contained database file that can be created and manipulated on any machine with the Hyper API installed.

For data engineering teams building pipelines that feed Tableau analytics, the Hyper API answers the question of how to get data into Tableau extracts without going through Tableau Desktop's data source publishing workflow or triggering full extract refreshes through Tableau Server.

When to Use the Hyper API

The Hyper API solves specific problems that the standard Tableau data source workflow does not address efficiently:

**Extract generation as part of a data pipeline** — a data pipeline produces a transformed dataset that should be published to Tableau for downstream analytics. Rather than configuring a Tableau data source that points to the pipeline's output table and scheduling a refresh, the pipeline can write the output directly to a .hyper file using the Hyper API, then publish the file to Tableau Server or Cloud using the REST API. The extract is always current the moment the pipeline completes — no refresh lag.

**Incremental updates to large extracts** — a full extract refresh for a large dataset is expensive: the entire dataset is re-read, the extract is rebuilt from scratch, and server resources are consumed for the duration. The Hyper API supports incremental operations: appending new rows, updating existing rows by key, and deleting rows. For datasets where only a small percentage of rows change between refresh cycles, incremental Hyper API operations are significantly more efficient than full refreshes.

**Extract modification without Tableau Desktop** — changing the schema of an existing extract — adding columns, renaming columns, changing data types — normally requires opening the workbook in Desktop and modifying the data source. The Hyper API can modify extract schemas programmatically, enabling automated schema evolution as upstream data sources change.

**Custom data transformation before publishing** — the Hyper API can perform complex data transformations before writing to the extract, using the full expressive power of the Python or Java environment rather than being limited to what Tableau's data source transformation layer supports.

Hyper API Architecture

The Hyper API communicates with a local Hyper database engine process. When the Hyper API is initialised in code, it starts a local Hyper process; when the code exits the API context, the process shuts down. The Hyper file itself is a portable database file that can be moved between machines and published to Tableau Server or Cloud via the REST API.

The core object model:

**HyperProcess** — the local Hyper engine process. Started once and reused across connections.

**Connection** — a connection to a Hyper file. Operations are performed within a connection context.

**TableDefinition** — defines the schema of a table in the Hyper file: table name, columns (name and data type), and optional column-level nullability.

**Inserter** — a bulk-insert object for writing rows to a table efficiently. Inserter.add_row() adds individual rows; Inserter.execute_inserts() flushes the batch to the file.

A minimal Python example creating a Hyper file:

from tableauhyperapi import HyperProcess, Telemetry, Connection, CreateMode, TableDefinition, SqlType, TableName, Inserter

with HyperProcess(telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU) as hyper:

with Connection(endpoint=hyper.endpoint, database="output.hyper", create_mode=CreateMode.CREATE_AND_REPLACE) as connection:

table_def = TableDefinition(

table_name=TableName("Extract", "Extract"),

columns=[

TableDefinition.Column("customer_id", SqlType.int()),

TableDefinition.Column("revenue", SqlType.double()),

TableDefinition.Column("order_date", SqlType.date()),

]

)

connection.catalog.create_schema_if_not_exists("Extract")

connection.catalog.create_table_if_not_exists(table_def)

with Inserter(connection, table_def) as inserter:

for row in data_rows:

inserter.add_row([row.customer_id, row.revenue, row.order_date])

inserter.execute_inserts()

After the Hyper file is created, publish it to Tableau Server or Cloud using the REST API's datasources/publish endpoint.

Hyper API Data Types

The Hyper API exposes a rich set of SQL data types that map to standard Tableau data types:

**SqlType.text()** — variable-length Unicode text. Use for string dimensions.

**SqlType.int()** — 32-bit integer. Use for integer identifiers and counts.

**SqlType.big_int()** — 64-bit integer. Use for large integer values.

**SqlType.double()** — 64-bit floating point. Use for continuous numeric measures.

**SqlType.date()** — calendar date without time. Use for date dimensions.

**SqlType.timestamp()** — date and time without timezone. Use for datetime dimensions.

**SqlType.timestamp_tz()** — date and time with timezone. Use for events from multiple time zones.

**SqlType.bool()** — boolean.

**SqlType.numeric(precision, scale)** — fixed-precision decimal. Use for financial values where floating-point rounding is unacceptable.

Choosing the correct data type is important: string data types used for numeric values degrade query performance; double used for financial calculations introduces floating-point rounding errors. Match the Hyper data type to the semantic type of the data.

Publishing Hyper Files via the REST API

The Hyper API creates the extract file; the REST API publishes it to Tableau Server or Tableau Cloud. The publish request uploads the .hyper file and creates or replaces a published data source.

Python example using the tableauserverclient library:

import tableauserverclient as TSC

server = TSC.Server("https://your-tableau-server", use_server_version=True)

tableau_auth = TSC.PersonalAccessTokenAuth("token_name", "token_value", "site_name")

with server.auth.sign_in(tableau_auth):

datasource = TSC.DatasourceItem(project_id="your-project-id")

datasource, job = server.datasources.publish(

datasource,

"output.hyper",

"CreateOrReplace"

)

The publish mode "CreateOrReplace" replaces an existing data source with the same name, or creates it if it does not exist. This is the standard mode for pipeline-driven extract updates.

Incremental Hyper Operations

For datasets where only recent or modified data changes, the Hyper API supports incremental operations more efficiently than full replacement:

**Append** — use Inserter to add new rows to an existing Hyper file without clearing existing rows.

**Upsert (update + insert)** — execute SQL DELETE WHERE on rows that need to be replaced, then INSERT new versions via Inserter.

**Delete** — execute SQL DELETE WHERE to remove specific rows.

These operations work on a local copy of the Hyper file. For a published Tableau data source, the workflow is: download the existing .hyper file from Tableau Server using the REST API, apply incremental modifications using the Hyper API, and re-publish the modified file.

For very large extracts where even downloading the full file is expensive, consider whether the extract architecture should use multiple Hyper files — a current period file that is small enough to refresh fully, and a historical file that is appended to incrementally.

Our data architecture and Tableau consulting practice designs extract pipeline architectures using the Hyper API for enterprise clients — contact us to discuss programmatic extract management for your Tableau environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →