BlogData Engineering

Apache Iceberg Explained: The Open Table Format for the Modern Data Lakehouse

James Okafor
James Okafor
Data & Cloud Engineer
·July 10, 202611 min read

Apache Iceberg is the open table format that enables ACID transactions, schema evolution, time travel, and hidden partitioning on object storage. This guide covers what Iceberg actually does, how it compares to Delta Lake and Hudi, and when to use it.

Apache Iceberg is an open table format designed for large-scale analytic datasets on object storage. It adds ACID transactions, schema evolution, time travel, and hidden partitioning on top of Parquet, ORC, or Avro files in S3, ADLS, or GCS — without requiring a warehouse to own the data. This guide covers what Iceberg actually is, how it compares to Delta Lake and Apache Hudi, and when to use it.

What a table format is

To understand Iceberg, start with the problem it solves. A directory of Parquet files on S3 is not a table. It is a collection of files. To query it as a table, you need metadata: the schema, the list of files that constitute the table, statistics about those files, and a record of changes over time. Without this metadata, every query must list all files and infer the schema.

Traditional Hive metastore provides basic metadata — table definitions and partition locations — but it was not designed for ACID guarantees, concurrent writes, or schema evolution. When multiple writers modify the same table simultaneously using Hive, you get race conditions and data corruption. When you add a column, you need to update every partition definition.

A table format like Iceberg, Delta Lake, or Hudi provides structured metadata that enables full ACID transactions, optimistic concurrency control, schema evolution without partition rewrites, and a complete history of changes to the table.

Iceberg's metadata model

Iceberg maintains a multi-level metadata hierarchy:

**Metadata files** are JSON documents containing the table's current schema, partition specification, snapshot list, and current snapshot pointer. Each table operation creates a new metadata file; the current one is pointed to by the catalog.

**Manifest lists** are Avro files listing the manifest files that make up each snapshot. Each snapshot has exactly one manifest list.

**Manifest files** are Avro files listing the data files that make up the snapshot, with per-file statistics: row count, lower and upper bounds for each column, null value counts. These statistics enable partition and column pruning without reading data files.

**Data files** are the actual Parquet (or ORC, or Avro) files containing the table data.

This layered structure allows Iceberg to answer "which files do I need to read for this query?" without scanning the entire table. A query filtering on event_date = '2026-05-25' reads the manifest list, scans manifests for files with event_date in range, and reads only matching data files. No full directory listing required.

What Iceberg enables

**ACID transactions**: Iceberg uses optimistic concurrency control — writers read the current snapshot, apply changes, and attempt to commit a new snapshot. If another writer committed in the meantime, the commit fails with a conflict and the writer retries. This prevents the data corruption that occurs with naive concurrent writes to object storage.

**Schema evolution**: You can add, drop, rename, or reorder columns without rewriting data files. Iceberg tracks column IDs separately from column names — renaming a column updates the schema metadata but leaves all existing data files untouched. This is possible because Iceberg data files use column IDs (integers) internally, not column names, so a name change does not invalidate existing files.

**Hidden partitioning**: Iceberg partitions are defined as transforms on source columns — YEAR(event_timestamp), MONTH(event_timestamp), BUCKET(order_id, 16) — rather than as separate physical partition columns. The partition values are computed by Iceberg and stored in manifest metadata. Users query on the source column (event_timestamp = '2026-05-25') and Iceberg handles partition pruning transparently. No more wrong-partition queries due to timestamp format mismatches.

**Partition evolution**: The partition scheme can be changed without rewriting existing data. Add a new partitioning transform for future data while retaining the old partition scheme for historical data. Iceberg's metadata layer handles both transparently in the same table.

**Time travel**: Every committed snapshot is retained (until explicitly expired). Query a table as of a specific snapshot ID or timestamp: SELECT * FROM orders FOR SYSTEM_TIME AS OF '2026-05-01 00:00:00'. This enables regulatory compliance (reproducible historical reports), debugging (what did the table look like before the bad pipeline run?), and incremental processing (what changed between snapshot X and snapshot Y).

**Incremental processing**: Iceberg's incremental read API returns only the rows that changed between two snapshots. This is more efficient than timestamp-based incremental processing because it does not require a lookback window or late-arrival handling — the snapshot diff is exact.

Iceberg vs Delta Lake

Delta Lake (Databricks) and Iceberg solve the same problem with different trade-offs.

**Ecosystem**: Delta Lake is most tightly integrated with Databricks and Spark. Iceberg is designed to be engine-agnostic — it is fully supported by Spark, Flink, Trino, Presto, Hive, and Snowflake (via Iceberg external tables). If you run multiple query engines against the same data (Spark for ETL, Trino for interactive queries, Snowflake for reporting), Iceberg's multi-engine support is a significant advantage.

**Format**: Delta Lake uses a transaction log (JSON files in the _delta_log directory) alongside Parquet data files. Iceberg uses a hierarchical metadata model with Avro manifests. For most use cases, the performance characteristics are similar. Iceberg has slightly better column pruning due to per-column statistics in manifest files.

**Governance and ownership**: Data in a Delta Lake table is most naturally managed by Databricks or a Databricks-compatible engine. Data in an Iceberg table is owned by whoever owns the object storage and the catalog — you are not locked into a vendor-specific format. For organisations prioritising open standards and avoiding vendor lock-in, Iceberg is preferable.

**Maturity on Spark**: Delta Lake is older on Spark and has more mature Spark-specific optimisations. If Spark is your only engine, Delta Lake is a reasonable default. If multi-engine is a requirement, Iceberg.

Iceberg vs Apache Hudi

Hudi (Hadoop Upserts, Deletes, and Incrementals) was designed primarily for CDC (change data capture) use cases — upserts and deletes at high frequency. Hudi is most commonly used on AWS with EMR and S3.

Iceberg has caught up with Hudi's CDC capabilities and offers better query performance for analytical workloads. For new data lake projects, most teams choose between Delta and Iceberg rather than Hudi, unless they have specific Hudi infrastructure in place.

When to use Iceberg

**Multi-engine environments**: If you need Spark, Flink, Trino, and Snowflake all reading and writing the same tables, Iceberg is the most interoperable choice.

**AWS environments not using Databricks**: Iceberg is natively integrated with AWS Glue, AWS Athena, and Amazon EMR. If your data lake is on S3 and you are not invested in the Databricks ecosystem, Iceberg + Glue catalog is a natural combination.

**Avoiding vendor lock-in**: Iceberg is a pure Apache project with no commercial entity controlling it. Delta Lake is open-source but originated at Databricks and is most mature in the Databricks context.

**Long-lived tables with schema evolution requirements**: Iceberg's schema evolution model is cleaner than alternatives for tables that will change structure over years.

Catalogs

Iceberg tables require a catalog to track the current metadata file location. The catalog options:

- **Hive Metastore**: Traditional, widely supported.

- **AWS Glue**: Native in AWS, serverless, integrates with Athena and EMR.

- **Nessie**: Git-like branching and tagging for Iceberg tables — experimental but powerful.

- **REST Catalog**: The Iceberg REST Catalog specification enables any web service to act as a catalog.

- **JDBC Catalog**: Stores metadata in any relational database.

For cloud deployments, Glue (AWS) or the Unity Catalog (Databricks with Iceberg support via the open Iceberg standard) are the most practical options.

For the broader data lakehouse context, see what is a data lakehouse and delta lake guide. Our data architecture consulting practice designs and migrates organisations to modern lakehouse architectures — book a free architecture review to discuss whether Iceberg is the right choice for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →