BlogData Engineering

Snowflake External Tables: Querying S3 and Cloud Storage Without Loading Data

James Okafor
James Okafor
Data & Cloud Engineer
·July 4, 20269 min read

Snowflake External Tables let you query data in S3, Azure Data Lake, or Google Cloud Storage using Snowflake SQL — without loading it into Snowflake. Here is when to use them and how they compare to materialised tables.

External tables in Snowflake let you query files stored in cloud object storage — S3, Azure Blob Storage, ADLS Gen2, or GCS — as if they were Snowflake tables, without loading the data into Snowflake's managed storage. This guide covers what external tables are, when to use them, how they compare to materialised tables, and the performance and cost trade-offs you need to understand before using them in production.

What external tables actually are

An external table is a Snowflake metadata object that points at a stage (an external location — S3 bucket, Azure container, GCS bucket) and defines a schema for reading files at that location. When you query an external table, Snowflake reads the underlying files at query time, parses them according to the file format definition, applies the schema, and returns results.

The data never moves. The files stay in your object storage. Snowflake reads them on demand.

External tables are read-only. You cannot run DML (INSERT, UPDATE, DELETE) against them. You can query them with SELECT, join them to other tables, and build materialised views on top of them.

Supported storage and file formats

External tables support:

- **Storage**: AWS S3, Azure Blob Storage / ADLS Gen2, Google Cloud Storage

- **File formats**: Parquet, ORC, Avro (semi-structured); CSV, JSON, XML (with limitations)

Parquet is the format to use if you have a choice. Columnar storage means Snowflake can prune at the column level and skip blocks that do not match the query predicate. CSV requires reading entire files; Parquet allows selective column reads. The performance difference is significant for wide files.

Metadata columns and partitioning

Every external table automatically has a VALUE column (type VARIANT) containing the full parsed record and system metadata columns:

- METADATA$FILENAME — the path of the file the row came from

- METADATA$FILE_ROW_NUMBER — the row number within the file

- METADATA$FILE_LAST_MODIFIED — last modified timestamp of the file

You define additional columns using expressions that extract from the VALUE column. For example:

The column definition uses a cast from VALUE:order_id. The partition column is typically derived from the file path using METADATA$FILENAME, parsed with a split function to extract the date component from a path like orders/2026/05/25/batch_001.parquet.

Partition columns are critical for external table performance. Snowflake cannot push predicates into external storage without partitioning metadata. A query filtering on event_date against a partitioned external table scans only matching partition directories. The same query against an unpartitioned external table scans every file.

Automatic refreshes and event notifications

The external table's partition metadata must be kept in sync with what is in your object storage. Snowflake offers two mechanisms:

**Manual refresh**: ALTER EXTERNAL TABLE ... REFRESH. Scans the stage for new or removed files and updates the metadata. Required before queries see newly arrived files. Fine for low-frequency data; impractical for streaming or hourly data.

**Automatic refresh**: Uses cloud event notifications (S3 Event Notifications to SQS, Azure Event Grid, GCS Pub/Sub) to trigger metadata refresh as new files arrive. Near real-time. Requires configuring the event notification on the storage side. Once configured, external table metadata stays current without manual intervention.

For production external tables on data arriving more frequently than daily, configure automatic refresh.

External tables vs materialised tables: when to use each

External tables are NOT a general-purpose alternative to loading data. They have specific use cases where they are the right tool.

Use external tables when:

Data volume is very large and most queries touch a small fraction of it. An 8-year history of raw events at petabyte scale — if most queries touch the last 90 days, materialising 8 years of data in Snowflake is expensive. An external table with a date partition column lets you query recent data efficiently and rare historical queries run slower but at storage cost rather than Snowflake storage cost.

You need to query files that other systems also own. If your data lake files are produced and consumed by multiple systems (Spark, Presto, Athena, Snowflake), keeping them in object storage with a shared schema means all systems see the same data without synchronisation.

You are building a landing zone for raw ingest. Files land from source systems, you want to validate and profile them with SQL before deciding how to load them. External tables give you SQL access to files without commitment to a particular table structure.

Use materialised tables when:

Query performance matters. External table queries are slower than equivalent queries on Snowflake-managed Parquet due to the overhead of reading from object storage at query time, the absence of Snowflake micro-partition pruning, and the lack of clustering. For dashboards and interactive analytics, load data into Snowflake.

You need DML. External tables are read-only.

You need clustering, search optimisation, or dynamic data masking on row data. These Snowflake features apply to managed storage tables only.

Performance: what to expect

External table query performance depends primarily on:

**Partition pruning**: A well-partitioned external table on Parquet files can be quite fast. A date-partitioned external table with a date filter scans only the matching directories. Predicate pushdown works at the partition level.

**File size and count**: Many small files hurt. Snowflake opens each file independently. A directory with 10,000 files of 1MB each will be slower than a directory with 100 files of 100MB each, even with the same total data volume. Compaction of small files before querying significantly improves performance.

**File format**: Parquet >> CSV. Always use Parquet for external tables with any volume.

**Virtual columns**: Complex expressions in virtual column definitions (the column expressions on the VALUE column) add CPU overhead at query time. Keep them simple.

For a fair benchmark: external table queries against well-partitioned Parquet are typically 3–10x slower than equivalent queries against clustered Snowflake-managed tables. For analytical workloads that can tolerate that trade-off, external tables are viable. For interactive dashboards, they are not.

Materialised views on external tables

You can create materialised views on top of external tables. The materialised view is stored in Snowflake managed storage. It automatically refreshes when the external table's underlying data changes (using the same event notification mechanism). Queries against the materialised view use Snowflake micro-partition pruning and are as fast as any other materialised table.

This pattern — external table as the raw landing zone, materialised view as the queryable layer — is a pragmatic way to get Snowflake query performance while keeping raw data in object storage at lower cost.

Cost model

External tables have no Snowflake storage cost. You pay only for the object storage (S3/Azure/GCS), which is typically 60–80% cheaper than Snowflake storage.

Compute cost (virtual warehouse credits) is the same or higher than for materialised tables — external table queries use the same virtual warehouse, and may require larger warehouses or more credits due to the absence of micro-partition pruning.

The net cost trade-off depends on your query patterns. For data that is rarely queried (compliance archives, historical raw events), external tables are significantly cheaper. For data queried frequently, materialised tables are cheaper in practice because queries are more efficient.

For the broader data architecture decisions around Snowflake, see snowflake architecture best practices and data warehouse design. Our data architecture consulting practice can help you design the right storage and query strategy for your Snowflake environment — book a free audit.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →