E-commerce businesses generate rich behavioural data — browsing sessions, cart events, purchase transactions, returns, customer service interactions — that can answer most of the questions that drive growth decisions. But turning that data into analytical insight requires specific architectural patterns and metric definitions that are often implemented incorrectly.
E-commerce businesses have access to richer behavioural data than almost any other type of business. Every browsing session, product view, add-to-cart event, and purchase transaction is observable and attributable to a specific user or session. This richness is both the opportunity and the analytical challenge: the data exists to answer almost any customer or commercial question, but the architecture and metric definitions required to answer those questions correctly are not obvious.
The Event Data Foundation
E-commerce analytics starts with event data: a record of every user action in the product, with sufficient context to understand what happened, when, to whom, and in what context.
Key events that every e-commerce analytics architecture must capture:
**Session and identity events**: Session start, session end, device type, referral source, UTM parameters. The referral source at session start is the attribution data for everything that happens in that session. UTM parameters for paid traffic channels — campaign, ad group, creative — enable channel-level performance measurement.
**Product discovery events**: Page views (category pages, product pages), search queries (with search terms and number of results returned), filter applications. Product discovery events reveal where customers are finding (and failing to find) the products that convert.
**Cart events**: Add to cart (product ID, variant, quantity, price), remove from cart, cart abandonment (session ended with items in cart). Cart conversion rate and cart abandonment rate are the primary metrics for purchase funnel efficiency.
**Transaction events**: Purchase completed, order ID, line items (product, variant, quantity, price, discount applied), payment method, shipping method, promotional code applied. Transaction data is the conversion foundation for all commercial analytics.
**Post-purchase events**: Order shipped, order delivered, return initiated, return completed. Post-purchase events feed into net revenue calculations (gross minus returns) and service-level metrics.
**Customer identity resolution**: Anonymous visitors become known customers at registration or purchase. The identity resolution event (anonymous visitor ID → customer ID) is critical for attribution and LTV calculations. Without it, a customer's pre-purchase behaviour cannot be attributed to the customer who eventually purchases.
Conversion Funnel Metrics
The conversion funnel from first visit to purchase has standard stages and standard metrics at each stage:
**Traffic metrics**: Sessions, unique visitors, new vs. returning visitors, traffic by channel (organic search, paid search, email, direct, social, referral). Traffic metrics are the top of the funnel — volume entering the analytical system.
**Engagement metrics**: Pages per session, session duration, bounce rate. These indicate whether traffic is engaging with the site or bouncing without interaction. High bounce rate from paid channels is a targeting and landing page relevance problem.
**Category and product page metrics**: Page views per product, time on product page, product-to-cart rate (users who viewed a product and added to cart). These reveal which products have commercial potential but incomplete conversion.
**Add-to-cart metrics**: Add-to-cart rate (sessions with add-to-cart event / total sessions), cart items per session, cart value per session. The gap between product page views and adds-to-cart indicates product page conversion problems (price, description, images, reviews).
**Checkout metrics**: Cart-to-checkout rate, checkout-to-purchase rate (checkout abandonment), checkout steps completed before abandonment. Checkout abandonment by step reveals specific UX or technical friction.
**Purchase metrics**: Conversion rate (purchases / sessions), average order value (AOV), revenue, transactions. The primary commercial metrics.
Each transition in the funnel — visit to engagement, engagement to product view, product view to cart, cart to purchase — has a rate that is measurable and improvable. The funnel analysis identifies where the largest conversion drop occurs and focuses optimisation effort there.
Attribution Models and Their Limitations
Marketing attribution assigns credit for a purchase to the marketing touchpoints that influenced the customer's journey. The attribution model determines how that credit is distributed across touchpoints.
**Last-click attribution**: 100% of credit to the final touchpoint before purchase. Simple to implement; systematically undervalues top-of-funnel and mid-funnel channels that contribute to awareness and consideration. Google Ads self-report heavily uses last-click; it overstates direct impact.
**First-click attribution**: 100% of credit to the first touchpoint. Overvalues awareness channels; ignores the channels that drove the final conversion decision.
**Linear attribution**: Equal credit to all touchpoints in the journey. Conceptually fair but practically difficult to act on — all channels look equally valuable regardless of their actual role.
**Time-decay attribution**: More credit to touchpoints closer to conversion. Reasonably models the fact that touchpoints closer to purchase are more closely tied to the decision.
**Data-driven attribution**: Machine learning-based attribution that estimates the incremental contribution of each touchpoint based on observed conversion patterns. Available in Google Analytics 4 and can be implemented custom. More accurate than rule-based models but requires sufficient conversion volume to train reliably.
For e-commerce analytics architecture, the practical recommendation: implement multiple attribution models and expose them for comparison rather than choosing one. Different stakeholders have legitimate reasons to prefer different models; the data team's role is to provide all models and the analytical context to interpret each, not to impose a single model.
The fundamental limitation of all digital attribution models: they only see touchpoints within the trackable digital ecosystem. Offline influencers (word of mouth, TV advertising, in-store experience) produce online conversions that are attributed to the last trackable digital touchpoint rather than the actual cause.
Customer Lifetime Value Architecture
Customer LTV (lifetime value) is the total revenue a customer is expected to generate over their relationship with the business. It requires data at the customer level: all transactions attributed to the customer, correctly identity-resolved.
**Historical LTV**: Total revenue from all past transactions attributed to a customer. Straightforward to compute but backward-looking; useful for segmentation and cohort analysis.
**Predicted LTV**: A forward-looking estimate of total revenue from the customer over a defined future period (typically 12 months or lifetime). Requires a prediction model — either statistical (Pareto/NBD or BG/NBD model for purchase frequency and churn probability) or machine learning-based. Predicted LTV is more useful for current-period decision-making (who to invest in retaining, who to target with upsell) but requires more implementation investment.
LTV models for e-commerce require clean, complete transaction history at the customer level, identity resolution that correctly attributes transactions to customers (including transactions before account creation), and a customer grain table with all relevant attributes that influence purchase behaviour.
Product Analytics: Inventory, Assortment, and Merchandising
Beyond customer analytics, e-commerce businesses need product-level analytics:
**Product performance**: Revenue, units sold, return rate, and margin by product and category. Products with high views but low conversion need investigation (pricing, description, images). Products with high returns need investigation (product quality, description accuracy, size guides).
**Inventory analytics**: Stockout rate (days when inventory was zero for a product), overstock by SKU, inventory turns (units sold / average inventory). Stockouts are direct revenue losses; overstock ties up capital.
**Search analytics**: What do customers search for? How often do they search and find nothing? High search volume with low result relevance indicates a catalog gap or search algorithm problem.
**Price elasticity**: How does conversion rate change when price changes? A/B test price points on comparable product groups to estimate demand elasticity. Pricing decisions informed by elasticity data produce better margin than intuition-based pricing.
Our data architecture practice designs e-commerce analytics architectures from event collection through to predictive LTV models — contact us to discuss your e-commerce analytics requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →