Real-Time Analytics Architecture: Designing for Sub-Second Query Latency

The architectural patterns and technology choices behind real-time analytics — stream processing with Kafka and Flink, OLAP databases designed for high-concurrency sub-second queries, the lambda and kappa architecture trade-offs, and how to decide when real-time analytics is actually necessary versus when batch is sufficient.

Most analytics workloads do not need to be real-time. Batch processing — running transformations every hour, every four hours, overnight — is cheaper, simpler, and easier to debug. The first question in any real-time analytics architecture discussion is whether real-time is actually required. The second question is what "real-time" means for the specific use case — because sub-second, sub-minute, and sub-hour have radically different architectural implications.

With that framing established, here is how to design analytics infrastructure when real-time latency is genuinely required.

Defining the Latency Requirement

Real-time analytics is not a single architectural category. The latency target determines the architecture.

**Sub-second (0–1 second):** Operational dashboards, fraud detection, live gaming leaderboards. Requires a streaming database or in-memory OLAP with direct event stream ingestion. No batch transformation layer. Examples: Apache Pinot, ClickHouse, Apache Druid.

**Near-real-time (1–60 seconds):** Operational monitoring, alerting dashboards, live customer-facing metrics. Stream processing transforms events before loading to OLAP. Micro-batch (Spark Structured Streaming, Flink) or true streaming. Examples: Kafka + Flink + ClickHouse.

**Sub-minute to sub-hour (1–60 minutes):** Most operational analytics, marketing dashboards, support team dashboards. Frequent batch microcycles — dbt + Databricks or Snowflake Dynamic Tables. This is the zone where modern cloud warehouses close the gap with streaming infrastructure.

**Hourly+ (batch):** Strategic analytics, financial reporting, data science pipelines. Standard batch ELT — Fivetran + dbt + Snowflake or BigQuery. This is where most analytics workloads belong.

Getting precise about the latency requirement before architecture selection avoids the most common failure mode: over-engineering a streaming architecture for a use case that sub-hourly batch would serve at one-fifth the operational complexity.

Stream Processing Architecture

For latency requirements below 60 seconds, stream processing is typically required. The core components:

### Event Streaming Platform

Apache Kafka is the production standard for high-throughput event streaming. It provides durable, ordered, partitioned event logs that multiple consumers can read independently. Events are retained for configurable periods — typically 7 to 30 days — allowing reprocessing without re-ingestion from source systems.

Kafka serves two roles in a real-time analytics architecture: event ingestion (collecting events from application systems) and event distribution (routing processed events to downstream consumers including OLAP databases and data warehouses).

For cloud-native environments, Confluent Cloud (managed Kafka), AWS MSK (managed Kafka), or Redpanda (Kafka-compatible, lower latency) are common choices. The API is Kafka-compatible across all options.

### Stream Processing Engine

Raw events from Kafka typically need transformation before they are analytically useful: enrichment with reference data, aggregation into time-window summaries, deduplication, and joining of related event streams.

**Apache Flink** is the production standard for stateful stream processing at scale. It provides exactly-once processing semantics, low-latency event-time processing, flexible windowing (tumbling, sliding, session windows), and native Kafka integration. Flink is operationally complex; managed options (Confluent Cloud's Flink, AWS Kinesis Data Analytics with Flink) reduce the infrastructure burden.

**Spark Structured Streaming** is more familiar to data engineering teams with Spark experience. It uses a micro-batch execution model — processing accumulated events every 100ms to 10s rather than event-by-event — which introduces slightly higher latency than Flink but simplifies semantics and reduces operational complexity.

**ksqlDB** (from Confluent) provides a SQL interface for stream processing within the Kafka ecosystem. Lower barrier to entry, less flexible for complex transformations, appropriate for moderate complexity use cases.

### Real-Time OLAP Database

The stream processor outputs transformed events to an OLAP database designed for high-concurrency, sub-second analytical queries. These databases are architecturally different from batch data warehouses: they optimise for write throughput and query latency at the expense of complex join capability and transformation flexibility.

**Apache Pinot:** Open-source, designed for user-facing real-time analytics. LinkedIn, Uber, and Stripe use it for their internal analytics dashboards. Segment-based architecture with real-time and offline segments. Excellent for high-cardinality dimensions and scatter-gather query patterns.

**Apache Druid:** Open-source, designed for sub-second OLAP on event data. Strong for time-series analytics, pre-aggregation via rollup for high-volume data. Used extensively for operational monitoring and clickstream analytics.

**ClickHouse:** Column-store OLAP database with exceptional query performance on large datasets. Supports both streaming ingestion (via Kafka connector) and batch loading. Lower operational complexity than Pinot or Druid. Increasingly common for real-time analytics use cases at moderate scale.

**StarRocks / Apache Doris:** Newer generation of real-time OLAP databases with support for both real-time ingestion and standard SQL joins. Emerging choice for teams that need real-time latency without sacrificing query expressiveness.

Lambda vs Kappa Architecture

Two dominant architectural patterns organise the relationship between stream and batch processing.

### Lambda Architecture

Lambda architecture maintains two parallel processing layers:

**Speed layer:** Stream processing pipeline (Kafka + Flink) producing low-latency views with recent data. Updated continuously as events arrive.

**Batch layer:** Traditional batch ELT pipeline (dbt + warehouse) producing accurate, complete historical views. Updated on a schedule.

**Serving layer:** Query router that merges speed layer results (recent) with batch layer results (historical) for end users.

The advantage: the batch layer provides a reliable source of truth even if the speed layer has issues. The disadvantage: you maintain two processing codebases for the same data, and merging the results at query time introduces complexity.

Lambda architecture made sense when stream processing engines had weaker exactly-once guarantees. With modern Flink, the reprocessing safety net of a parallel batch layer is less necessary — which is why Kappa is gaining ground.

### Kappa Architecture

Kappa architecture eliminates the batch layer. Stream processing handles everything: real-time ingestion, transformation, and historical reprocessing (by replaying from Kafka or a durable event store).

The advantage: one processing codebase, no merge logic at query time, simpler operations. The disadvantage: historical reprocessing requires either long Kafka retention or a separate event store, and complex historical transformations are harder to express as stream processing logic than as dbt SQL.

Kappa is the right choice for greenfield real-time analytics systems where the event stream is the source of truth. Lambda is more appropriate when you have an existing batch pipeline you cannot eliminate and need to add real-time capability on top.

The Role of Modern Cloud Warehouses

The boundary between streaming and batch analytics is eroding. Modern cloud warehouses provide capabilities that close the gap:

**Snowflake Dynamic Tables:** Materialised views that refresh automatically based on a configured lag target — as low as 1 minute. For many "near-real-time" use cases (5–60 minute latency requirements), Dynamic Tables on Snowflake replace a streaming architecture entirely.

**BigQuery Continuous Queries and INFORMATION_SCHEMA.STREAMING_STATISTICS:** BigQuery ingests streaming data in near-real-time and makes it queryable within seconds of arrival. For teams already on BigQuery, native streaming ingest eliminates the need for a separate streaming stack for many use cases.

**Databricks Structured Streaming on Delta Lake:** Delta tables with streaming reads and writes enable sub-minute latency on Databricks without a separate OLAP database.

For latency requirements above 5 minutes, the overhead of a full Kafka + Flink + Pinot streaming stack is rarely justified when a cloud warehouse can deliver the latency requirement with dramatically lower operational complexity.

When Real-Time Analytics Is Not Worth It

Real-time analytics infrastructure costs roughly 3–5× more to build and operate than equivalent batch infrastructure. The ongoing operational burden is higher: more infrastructure components, more failure modes, more complex debugging.

The business case for real-time analytics requires that the value of fresher data exceeds this cost premium. In practice, this is true for:

- Customer-facing analytics (dashboards in your product that users see)

- Fraud detection and risk scoring

- Live operational monitoring (infrastructure, supply chain, customer support queue)

- Markets and pricing systems

It is rarely true for: internal business intelligence, strategic reporting, marketing attribution, most financial analytics. For these workloads, sub-hourly batch delivers equivalent decision-making value at a fraction of the infrastructure cost.

Our data architecture consulting practice designs production analytics platforms from batch to real-time — contact us to discuss the right architecture for your latency and scale requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →