How to build a modern product analytics stack — event collection design, streaming ingestion, warehouse modelling for behavioural data, and the self-serve analytics layer that turns event data into product decisions without requiring SQL from every stakeholder.
Product analytics is the discipline of understanding how users interact with a software product — which features they use, where they drop off, what behaviours predict retention or churn. The modern product analytics stack has evolved significantly: the era of sending all events to Mixpanel or Amplitude and accepting whatever analysis those tools support is giving way to a warehouse-native approach where event data flows into the organisation's data warehouse and is analysed with the full power of SQL.
Understanding the modern product analytics architecture — and the trade-offs between SaaS tools and warehouse-native approaches — is essential for data teams supporting product organisations.
Event Data: The Foundation
Product analytics starts with event data — every meaningful user action tracked as an event: page views, button clicks, feature activations, errors, conversions. Each event has:
- **Event name**: a string identifying what happened (page_viewed, feature_activated, checkout_completed)
- **Properties**: a set of key-value pairs describing the event context (user_id, session_id, product_name, revenue_amount, device_type)
- **Timestamp**: when the event occurred
- **User/device identifiers**: anonymous ID (pre-login), user ID (post-login), device ID
**Event taxonomy design** is one of the highest-leverage decisions in product analytics. A well-designed event taxonomy is: consistent (every team uses the same event names and property names for the same concepts), complete (every meaningful user action is tracked), and stable (event names do not change without a migration plan).
Poor event taxonomies generate years of analytical debt: inconsistent naming across platforms (iOS vs Web vs Android tracking the same action as three different event names), missing properties on some events but not others, and deprecated event names that break historical analysis.
Event Collection Infrastructure
**Client-side SDKs.** Most event collection starts with client-side tracking: JavaScript SDKs in web applications, iOS and Android SDKs in mobile apps. Popular open-source options: Segment's analytics.js, Jitsu, RudderStack (open-source). Commercial: Segment (now owned by Twilio), Amplitude SDKs, mParticle.
**Server-side tracking.** For events where client-side tracking is unreliable (payment confirmation, server-processed conversions), server-side event production ensures accuracy. Server-side events are produced directly from backend systems using HTTP APIs.
**Customer Data Platforms as event routers.** Tools like Segment and RudderStack act as event routers: instrument once with their SDK, then route events to any downstream destination (Snowflake, BigQuery, Amplitude, Braze, etc.) without re-instrumenting for each destination. This is valuable for organisations sending events to multiple downstream systems.
Warehouse-Native vs SaaS Product Analytics
The core architectural choice: send events to a SaaS product analytics tool (Amplitude, Mixpanel, PostHog) for analysis, or send events to your data warehouse and analyse there.
SaaS product analytics (Amplitude, Mixpanel):
- Pre-built funnel, retention, and cohort analysis without writing SQL
- Real-time dashboards with minimal setup
- Non-technical product managers can analyse without engineering support
- Cost scales with event volume — expensive at high scale
- Limited to the analyses the tool supports; cannot join product events with other business data (revenue, support tickets, sales data)
Warehouse-native product analytics:
- Full SQL flexibility — join event data to any other warehouse data
- Revenue, LTV, and business metrics computable alongside behavioural metrics
- Lower marginal cost at high event volumes
- Requires data engineering to build analysis models (dbt) and BI tooling for self-serve
- Higher setup investment; slower to first insight
**The common pattern:** use a SaaS tool (PostHog or Mixpanel) for product team self-serve analysis (funnels, retention, feature flags), and stream events to the warehouse for deeper analytics that require joining with business data. This hybrid architecture avoids the all-or-nothing choice.
Warehouse Event Modelling
When event data lands in the warehouse, it typically arrives as a raw events table: one row per event, with event properties in a nested JSON or as separate columns. Raw events are not analytically useful directly — you need modelled datasets.
**Session modelling.** Group events into sessions (a session is a continuous period of user activity, with a timeout of 30 minutes being the standard). Session grain tables enable session-level metrics: sessions per user, session duration, session conversion rate.
**Funnel modelling.** Define conversion funnels (steps a user takes on the way to a key outcome) and model the funnel as a table with one row per user per funnel, recording whether and when they completed each step. Funnel conversion rates, drop-off points, and time-to-conversion are all derivable from this model.
**Retention modelling.** Cohort retention analysis requires: identifying the user's first event (acquisition date), their activity in subsequent time periods, and whether they were retained. The standard approach: a user_activity table with one row per user per day/week/month they were active, which enables N-period retention calculation.
**User attribution.** Which acquisition channel, campaign, or touchpoint should be credited for a conversion? Attribution modelling (first-touch, last-touch, linear, time-decay) is implemented on the event data using dbt models.
Self-Serve Analytics Layer
The analytical value of product event data depends on product managers and business stakeholders being able to explore it without filing SQL requests. The self-serve layer sits between the modelled warehouse data and the end users.
**For product and growth teams:** BI tools with good self-service capability (Looker, Mode, Metabase, Superset) expose the modelled event data as dashboards and explorable datasets. Key dashboards: retention curves, funnel analysis, feature adoption by cohort, revenue attribution by channel.
**For data scientists:** direct warehouse access with SQL + Python (Databricks notebooks, Hex, Observable) for cohort analysis, predictive modelling, and A/B test analysis.
**For product-managed self-serve:** PostHog (open-source, self-hostable) provides Amplitude/Mixpanel-style funnel and retention analysis with the option to store events in your own PostgreSQL or ClickHouse — giving product teams a SaaS-like experience against self-hosted event storage.
For data architecture design including product analytics stack design and event modelling, our data architecture consulting practice helps organisations build scalable product analytics infrastructure — contact us to discuss your requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →