BlogCloud Engineering

BigQuery Architecture: How Dremel, Slots, and Columnar Storage Work Together

James Okafor
James Okafor
Data & Cloud Engineer
·July 8, 202712 min read

BigQuery's performance and cost characteristics are often counterintuitive to engineers coming from traditional databases or even other cloud data warehouses. Understanding how BigQuery actually works — Dremel query execution, slot-based compute, Capacitor columnar storage, and the serverless architecture — explains behaviours that would otherwise seem unpredictable.

BigQuery's architecture is unlike any database most engineers have worked with. There is no cluster to provision, no indexes to create, no connection pool to configure. A query that scans 1 TB completes in seconds with no configuration, and the same query runs just as fast whether you run it once a day or ten teams run it concurrently. Understanding why this is possible — and where the system's constraints are — requires understanding the Dremel architecture, the columnar storage format, and the slot-based compute model.

Capacitor: The Columnar Storage Format

BigQuery stores data in Capacitor, Google's proprietary columnar storage format built on top of Google Colossus (GFS's successor). Capacitor is stored in a distributed fashion across many physical nodes in Google's data centres; individual tables are not stored on a single machine.

Capacitor stores data column by column rather than row by row. For a table with 100 columns, a query that reads 5 columns scans only the data in those 5 columns, not the full row width. This is the fundamental reason why SELECT * on a large BigQuery table is expensive while SELECT col1, col2 on the same table is cheap — BigQuery charges by bytes scanned, and column selectivity directly determines bytes scanned.

Capacitor applies column-level encoding and compression. Integer columns are often delta-encoded; string columns are dictionary-encoded. The combination of columnar layout and per-column encoding produces compression ratios of 3-10x for typical analytical data. Billed storage costs less than the raw data size.

**Partitioning**: BigQuery tables can be partitioned by a DATE or TIMESTAMP column, or by integer range. Partitioned tables store data in separate physical segments per partition value. A query with a filter on the partition column reads only the relevant partitions. A daily partitioned table with 5 years of data and a WHERE date = '2024-07-01' filter reads one day's partition, not all 1,825 days. This is the most direct mechanism for reducing bytes scanned and therefore query cost.

**Clustering**: Within a partition (or in the full table without partitioning), clustering organises data by the specified columns' sort order. Queries with filter or JOIN conditions on the clustered columns benefit from block-level pruning — BigQuery reads fewer Capacitor blocks because blocks that cannot contain matching values are skipped. Clustering improves both performance and cost for queries with appropriate filter conditions.

Dremel: The Query Execution Engine

Dremel is Google's query execution engine for columnar data. When you run a SQL query in BigQuery, Dremel processes it:

1. The query is parsed and compiled into an execution plan

2. The execution plan is distributed across many worker nodes (the number depends on available slot capacity)

3. Worker nodes read the relevant columns from Capacitor storage in parallel

4. Results are aggregated and returned

Dremel is a tree-structured execution engine: a root server distributes work to intermediate servers, which distribute work to leaf nodes that perform the actual data reads. This tree structure enables massive parallelism — a query that scans 1 TB can employ thousands of leaf nodes reading data simultaneously.

The degree of parallelism is bounded by two things: available slots (the compute capacity of your BigQuery project) and the data volume being scanned. A query that scans 100 GB cannot be parallelised across more nodes than the data can be effectively partitioned across. Dremel automatically determines the appropriate degree of parallelism for each query.

Slots: The Compute Model

BigQuery uses a slot-based compute model rather than a fixed cluster. A slot is a unit of computational capacity — approximately one CPU core with associated memory and disk resources. Dremel allocates slots from a pool to execute queries.

**On-demand (serverless) mode**: In the default on-demand mode, BigQuery automatically allocates slots from Google's shared pool when you run a query. Allocation is automatic and transparent — you never configure slot count. Billing is per TB scanned. The risk in on-demand mode is that during periods of very high demand, query execution may be slowed by competition for shared slots.

**Capacity reservations (flat-rate)**: You can purchase slot commitments (100-slot minimum, in multiples of 100) on monthly or annual contracts. Committed slots are dedicated to your project and are not shared. Flat-rate billing replaces per-TB billing. For projects with high query volume, flat-rate can be significantly more cost-effective than on-demand.

**Autoscaling** (available in the current BigQuery editions model): Baseline reserved slots plus burst capacity that scales automatically with workload demand. This model provides predictable baseline performance with flexibility for peak demand.

The practical implication of the slot model: a complex query that requires 100 slots to execute efficiently runs much faster than one that gets only 10 slots. In on-demand mode, you cannot control slot allocation. In reservation mode, the allocated slot count bounds query parallelism.

Query Result Cache

BigQuery maintains a query result cache for 24 hours. An identical query (same SQL, same referenced tables, same user) run within the cache period returns the cached result at zero cost. The cache operates at the user level — two different users running the same query do not share cache entries by default (though project-level caching configuration affects this).

Dashboard workloads that run the same queries repeatedly benefit substantially from the cache. A dashboard loaded by 100 users in a day, all running the same underlying queries, incurs the full compute cost only on the first run; subsequent runs hit the cache at zero cost. This makes BigQuery more cost-effective for high-read BI workloads than the per-query billing model implies at first glance.

The cache is invalidated when the referenced tables change. Queries against tables with streaming inserts or recent DML operations may not be cached even within the 24-hour window.

INFORMATION_SCHEMA for Monitoring

BigQuery's INFORMATION_SCHEMA views provide detailed query metadata for monitoring and cost management:

'INFORMATION_SCHEMA.JOBS_BY_PROJECT' records all jobs run in the project: user, query text, bytes billed, slot milliseconds, creation time, and duration. This is the primary source for cost attribution (which users or dashboards are generating the most cost), performance monitoring (which queries are slow), and usage analysis.

Queries against INFORMATION_SCHEMA.JOBS_BY_PROJECT return data partitioned by creation date; always include a filter on creation_time to avoid scanning the full job history at unnecessary cost.

Our cloud engineering and data architecture practice designs BigQuery implementations from storage layout to cost governance — contact us to discuss your BigQuery architecture requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →