BlogData Architecture

Marketing Analytics Architecture: Attribution, Funnel Analysis, and Customer Journey Data

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·July 5, 202713 min read

Marketing analytics is one of the most data-rich and analytically complex domains in enterprise analytics. Attribution, funnel analysis, cohort analysis, and customer lifetime value calculations each require specific data structures and metric definitions. This guide covers the architecture decisions that determine whether marketing analytics is insightful or misleading.

Marketing analytics is analytically demanding in a specific way: the questions are standard (which campaigns are performing, what is the customer acquisition cost, where in the funnel are we losing conversions) but the data required to answer them correctly spans multiple disconnected systems, the metric definitions are contested, and the attribution problem is genuinely hard. Marketing analytics architectures that seem to work often produce misleading results because of subtle errors in how the data is joined, attributed, or counted.

The Data Sources That Marketing Analytics Requires

Marketing analytics requires data from at least four separate domains, each with different characteristics:

**Ad platform data**: Impressions, clicks, and spend from Google Ads, Meta Ads, LinkedIn Ads, and other paid channels. This data is available via the platforms' APIs. The grain is typically ad set or campaign and date; the primary challenge is that each platform reports differently and uses different attribution windows (a Google Ads conversion may have a different attribution window than Meta's).

**Web and app analytics**: Sessions, events, and conversions from analytics platforms (Google Analytics 4, Amplitude, Segment) or server-side event streams. This data includes session-level attribution from UTM parameters and referral sources, funnel events (page views, add-to-cart, checkout), and conversion events.

**CRM data**: Lead records, pipeline stages, opportunity data, and deal closures. For B2B companies, the CRM is the source of truth for the marketing-to-sales funnel. For B2C e-commerce, the CRM or customer table holds identity resolution and purchase history.

**Operational/transaction data**: Actual revenue from closed deals or completed purchases. This is the ultimate outcome metric that marketing is trying to drive — and it often exists only in the transactional system, not in the marketing platforms that measure clicks and impressions.

Integrating these four domains into a unified analytical picture is the central architectural challenge of marketing analytics. Each domain has its own identity representation (Google Ads uses click IDs, the CRM uses customer IDs, the analytics platform uses anonymous user IDs), and identity resolution across them is both technically difficult and imprecise.

UTM Tracking Architecture

UTM parameters (utm_source, utm_medium, utm_campaign, utm_content, utm_term) are the mechanism for attributing marketing traffic to specific campaigns and channels. They are appended to landing page URLs in ad platforms, email links, and social posts.

Correct UTM architecture requires:

**Consistent UTM taxonomy**: A UTM taxonomy defines the allowed values for each UTM parameter and their meaning. Without a taxonomy, different teams use different values for the same concept: "google" vs "google_ads" vs "Google Ads" all appear in the data for the same channel. This makes channel-level analysis require messy string matching rather than clean grouping.

**UTM coverage**: Every paid URL should have UTM parameters. Untagged URLs produce "direct" traffic that cannot be attributed to the campaign that drove it. Audit UTM coverage in ad platforms by comparing URLs without parameters to paid traffic volume.

**UTM persistence in session**: UTM parameters captured at session start should be persisted through the session and stored with the conversion event. If a user adds UTM parameters to the landing page visit but the conversion event does not carry the UTM context from that session start, the conversion cannot be attributed to the original campaign.

**Auto-tagging vs manual UTM**: Google Ads auto-tagging (gclid) is separate from UTM parameters. If you use Google Analytics with Google Ads, gclid-based attribution works automatically. For custom analytics implementations, you need to either use UTM parameters manually or implement gclid→UTM mapping.

Funnel Analysis Architecture

Marketing funnel analysis requires a consistent definition of each funnel stage, a grain of analysis (session, user, or account), and a way to connect stages that occur in different sessions.

**Session-grain funnel**: The simplest implementation. For a given session, what is the highest funnel stage reached? This approach cannot attribute a purchase in Session 3 to the ad click that drove Session 1. Useful for measuring the conversion efficiency of individual sessions but cannot track multi-session journeys.

**User-grain funnel**: Attribute all sessions and events to a known user identity. Track the user's progression through the funnel across sessions. Requires identity resolution — connecting anonymous sessions before login to the known user after login. Provides accurate multi-session funnel measurement.

**Cohort-grain funnel**: Group users by their first interaction date (first session, first ad click, first registration) and track their progression through the funnel over time. Cohort analysis reveals whether conversion rates are improving over time and whether specific acquisition periods have different conversion profiles.

For SaaS and B2B businesses, account-grain funnel analysis — tracking all contacts at an account through the marketing-to-sales funnel — requires a bridge from web visit data (which is person-level) to account data in the CRM.

Marketing Attribution Architecture

Attribution data architecture requires decisions about: the attribution window, the attribution model, and the click-to-customer identity join.

**Attribution window**: How far back from a conversion do you attribute touchpoints? Google Ads default is 30 days for clicks, 1 day for view-through. Meta defaults differ. Consistent attribution windows across platforms are necessary for apples-to-apples channel comparison. Define your standard attribution window and apply it consistently.

**Click-to-customer identity join**: When a user clicks an ad, the ad platform records a click with a click ID (gclid for Google, fbclid for Meta). The user's session on your site captures this click ID from the URL. The conversion event must carry this click ID to the attribution layer. If the click ID is not captured in the session, the conversion cannot be attributed to the specific ad click that drove it.

**Cross-channel de-duplication**: A customer who clicks a Google Ads ad on Monday, an email on Tuesday, and a Meta ad on Thursday before purchasing on Friday appears in all three platforms' conversion reports. Without cross-channel de-duplication in your own data, you are counting the same conversion three times. Your data warehouse should apply attribution logic across all channels using your first-party conversion data, not relying on platform self-reporting.

Customer Acquisition Cost and ROI

CAC (customer acquisition cost) and ROAS (return on ad spend) are the primary marketing ROI metrics. Both require accurate numerator and denominator definitions.

**CAC calculation**: Total marketing spend (including staff, agency fees, tools, and media spend — not just media spend alone) / new customers acquired in the period. Defining "marketing spend" and "new customer" precisely is where most CAC calculations diverge.

**ROAS calculation**: Revenue attributed to advertising / advertising spend. The attribution model used determines which revenue is "attributed to advertising." A last-click ROAS of 4x may reflect a first-click ROAS of 2x because the last-click model takes credit for organic search conversions that were influenced by paid advertising earlier.

**Payback period and LTV/CAC ratio**: For subscription businesses, CAC pays back over the customer's lifetime. LTV/CAC ratio (customer lifetime value / CAC) measures whether the acquisition economics are sound. LTV/CAC > 3x is a common benchmark for healthy unit economics.

Our data architecture and BI strategy practice designs marketing analytics architectures from UTM taxonomy to attribution data models — contact us to discuss your marketing analytics requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →