What a feature store is, what problems it solves (training-serving skew, feature recomputation, cross-team reuse), the major options (Feast, Tecton, Vertex AI Feature Store, Databricks Feature Store), and when the operational overhead is justified versus simpler alternatives.
A feature store is a centralised platform for managing the features (inputs) used to train and serve machine learning models. It is one of the most debated components in the ML infrastructure landscape — genuinely solving important problems in mature ML organisations, but adding significant operational overhead in contexts where the problems it solves have not yet materialised.
Understanding what a feature store actually solves, the options available, and when the investment is justified is essential for data and ML engineering teams evaluating their infrastructure.
The Problems Feature Stores Solve
**Training-serving skew.** The most serious ML reliability problem: the features used to train a model are computed differently from the features used at inference time. A customer churn model is trained on features computed from a historical batch pipeline; at inference time, the same features are computed from a real-time pipeline with slightly different logic. The model was trained on one distribution; it is evaluated on another. This causes silent model degradation that is difficult to diagnose.
A feature store solves this by being the single computation environment for feature values, used both in training data generation and in real-time inference. Train and serve from the same code path.
**Feature recomputation waste.** In organisations with multiple ML models, the same features are commonly needed by multiple models. Without a feature store, each model's team independently computes the same features — duplicating engineering work and compute cost. A feature store computes each feature once and stores it for reuse.
**Feature discoverability.** In organisations with many ML projects, feature definitions and computation logic are scattered across notebooks, pipelines, and individual repositories. A new team building a model has no way to discover that a feature they need was already computed by another team. A feature store provides a catalogue of available features with documentation.
**Point-in-time correct training data.** Training an ML model requires historical examples of (features, label) pairs. The features for each training example should reflect what was known at the time the label was generated — not the most recent feature values. "What was the customer's purchase frequency in the 30 days before they churned?" requires retrieving historical feature values as of a specific timestamp. Feature stores with time travel capability enable point-in-time correct training data generation.
Feature Store Architecture
A feature store has two main components:
**Offline store.** A data warehouse or data lake where historical feature values are stored for training data generation. Querying the offline store for training data requires time travel (point-in-time lookups). Typically backed by Snowflake, BigQuery, Redshift, or Parquet on object storage.
**Online store.** A low-latency key-value store where current feature values are materialised for real-time inference. The online store must return feature values in milliseconds. Typically backed by Redis, DynamoDB, Cassandra, or a similar low-latency store.
**Materialisation pipeline.** A pipeline that computes feature values from raw data and writes them to both the offline and online stores, ensuring they stay synchronised.
The key architectural challenge: keeping the offline store (used for training, historical) and online store (used for serving, current) consistent. Inconsistency between them causes training-serving skew.
Major Options
**Feast** (open source): the most widely adopted open-source feature store. Feast is infrastructure-agnostic — it works with any offline store (Snowflake, BigQuery, Parquet) and any online store (Redis, DynamoDB). Feast handles materialisation from offline to online, the feature registry (catalogue), and the SDK for training data generation and online retrieval.
Feast requires significant setup: deploying the registry, configuring offline and online stores, building materialisation pipelines. It is not a turn-key solution. But for organisations that want control over their infrastructure and do not want vendor lock-in, Feast is the right starting point.
**Tecton** (managed service): a commercial feature store platform with managed infrastructure. Tecton provides end-to-end feature management — feature engineering in Python/SQL, orchestrated materialisation to online and offline stores, monitoring, and observability. Higher cost than Feast but significantly lower operational overhead.
**Databricks Feature Store** (now Feature Engineering in Databricks): native feature store integrated into Databricks. If you are already on Databricks, this is the lowest-friction option — feature tables live as Delta tables, automatically tracked and versioned, with native integration with Databricks MLflow for model training and serving.
**Vertex AI Feature Store** (Google Cloud): managed feature store for ML workflows on Google Cloud. Native integration with BigQuery (offline store) and Bigtable (online store). For organisations with GCP-native ML infrastructure, Vertex AI Feature Store reduces operational overhead.
**Amazon SageMaker Feature Store** (AWS): similar to Vertex AI Feature Store for AWS. Native integration with S3 and an online store backed by a managed low-latency data store.
When a Feature Store Is Justified
A feature store is the right investment when:
- Multiple ML models in production with overlapping feature requirements (reuse value)
- Training-serving skew has caused observable model degradation that is difficult to diagnose (pain point)
- Real-time inference requirements with latency budgets under 100ms (online store need)
- Multiple ML engineering teams that need to share features without ad-hoc coordination
- Regulatory requirements for reproducible model training (audit trail of feature values at training time)
A feature store is not justified when:
- One or two models in production, all batch inference — the simplest solution is a well-documented dbt pipeline
- Small team where communication is not yet a bottleneck
- Proof-of-concept or early-stage ML — feature store setup costs time that is better spent validating the ML use case
**The most common feature store mistake**: implementing a feature store before you have multiple models in production, before training-serving skew has been a problem, because the architecture looks mature. The operational overhead of maintaining a feature store (materialisation pipelines, online store operations, registry maintenance) is real. Build what you need, when you need it.
For ML infrastructure design including feature store evaluation, our data architecture consulting practice can advise on whether and what to build — contact us to discuss your ML data infrastructure requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →