A database index is a data structure that allows queries to find specific rows without scanning the entire table — like a book index that lets you jump to a page directly rather than reading from the beginning. This guide explains how indexes work, the common index types, and when to create or avoid them.
A database index is a data structure that allows the database engine to locate rows matching a query condition without scanning the entire table. Like a book index that lets you jump to a specific page, a database index lets the query executor jump to the rows matching a WHERE clause condition without reading every row from disk.
Indexes are one of the most impactful performance optimizations available for operational databases. A query that takes 30 seconds on a 100-million-row table without an index can return in milliseconds with the right index. They are also a common source of over-engineering — indexes have storage and write-performance costs, and unnecessary indexes slow down the writes they are meant to support.
How Indexes Work
The most common index type in relational databases is the B-tree (balanced tree) index. A B-tree index maintains a sorted data structure that stores index key values alongside pointers to the actual table rows. When a query filters WHERE customer_id = 12345, the database consults the B-tree index, traverses from the root to the leaf containing 12345, retrieves the row pointer, and fetches exactly those rows — without touching the rows with other customer_id values.
Without the index, the database performs a sequential scan — reading every row in the table to check whether the customer_id column matches. On large tables, sequential scans are slow. On small tables (a few thousand rows), sequential scans are often faster than index lookups because the overhead of index traversal exceeds the cost of reading the small table directly.
Index Types
**B-tree index:** The default and most common index. Efficient for equality (=), range (<, >, BETWEEN), and ORDER BY operations on indexed columns. The standard choice for most use cases.
**Hash index:** Stores a hash of the index key. Extremely fast for equality conditions (=) but useless for range queries — hash indexes cannot support ORDER BY or BETWEEN. Used in specific high-throughput equality-lookup scenarios.
**Composite index (multi-column index):** An index on multiple columns. A composite index on (customer_id, order_date) supports queries that filter on customer_id alone, or on both customer_id AND order_date, but not queries that filter only on order_date. The leftmost columns of a composite index must be used for the index to apply — this is the "leftmost prefix rule."
**Partial index:** An index on a subset of rows defined by a condition. An index on orders WHERE status = 'pending' indexes only pending orders. Queries filtering on status = 'pending' benefit from the partial index; queries for other statuses do not. Smaller and faster than a full index when only a subset of rows is frequently queried.
**Covering index:** An index that includes all the columns referenced by a query — so the database can answer the query entirely from the index without accessing the table. If a query selects customer_id, name, and email, and an index covers all three columns, the database never touches the table. Maximum performance for read-heavy patterns.
**Full-text index:** Optimized for text search across large string columns. Supports searches like "find all products whose description contains 'wireless charging'" through inverted index structures. Standard B-tree indexes are inefficient for substring searches.
When to Create Indexes
Create indexes on:
- **Primary keys:** Automatically indexed in most databases
- **Foreign keys:** Columns used in JOIN conditions — a foreign key index on orders.customer_key ensures joins to the customer dimension are fast
- **High-cardinality filter columns:** Columns frequently used in WHERE clauses with high selectivity (many distinct values) benefit most from indexing — customer_id, product_id, order_id
- **Columns in ORDER BY:** If queries frequently sort by a column, an index on that column can eliminate the sort step
Avoid creating indexes on:
- **Low-cardinality columns:** A boolean column (is_active = true/false) has only two distinct values; an index does not help the database significantly narrow down rows
- **Columns rarely used in filters:** An index that is never consulted by the query optimizer is pure overhead
- **Highly updated columns:** Each row update requires updating every index that includes the modified column; tables with very high write rates on indexed columns pay significant write overhead
Indexes in Cloud Data Warehouses
Traditional B-tree indexes are generally absent in cloud analytical data warehouses (BigQuery, Snowflake, Redshift). These systems use alternative mechanisms:
**BigQuery:** Partitioning and clustering. Partition pruning eliminates entire partitions (typically date ranges); clustering sorts data within partitions by specified columns so the storage layer can skip irrelevant data blocks.
**Snowflake:** Micro-partitioning stores data in small compressed columnar files. Automatic clustering (or manual CLUSTER BY definition) keeps related rows in the same micro-partitions, enabling partition pruning based on filter values.
**Redshift:** Sort keys and distribution keys. Sort keys determine the physical sort order of data in each node's storage; COMPOUND sort keys support multi-column sort order queries; INTERLEAVED sort keys weight all key columns equally.
The conceptual purpose is the same — avoid scanning data irrelevant to the query — but the mechanism is designed for the massively parallel columnar architecture of cloud warehouses rather than the row-oriented B-tree structure of transactional databases.
Our data engineering services and data architecture practice advises on index strategy for operational databases and partition/cluster strategy for cloud data warehouses. Contact us to discuss your query performance requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →