How to build a marketing analytics infrastructure that actually works — GA4 event tracking design, collecting ad platform data from Google Ads and Meta, attribution modelling in the warehouse, and the data model that connects marketing spend to revenue outcomes.
Marketing analytics is one of the most data-intensive functions in any organisation, and also one of the most technically fragmented. Marketing teams collect data from web analytics tools, advertising platforms, CRM systems, email platforms, and conversion tracking — each in a different format, with different identifiers, and with different latency characteristics. Making this data analytically useful requires intentional infrastructure design.
This guide covers the marketing analytics stack: event collection with GA4, ad platform data pipelines, attribution modelling in the warehouse, and the data model that connects marketing spend to revenue.
GA4 Event Tracking Design
Google Analytics 4 replaced Universal Analytics as Google's web analytics standard. The architectural change matters for data engineers: GA4 is entirely event-based (every interaction is an event with properties), and GA4 exports raw event data to BigQuery natively (for GA4 360, and free for properties below the export threshold).
**Event schema design.** GA4 events have a name (string) and up to 25 event parameters (key-value pairs). The standard events (page_view, session_start, purchase, form_submit) cover most tracking needs. Custom events capture business-specific interactions (e.g., quote_requested, demo_booked, product_comparison_viewed).
Design event names and parameters for downstream analysis, not just for GA4 reporting. An event called button_click with parameter button_text = 'Request Demo' is harder to analyse than an event called demo_request with a clear_call_to_action source parameter. Event taxonomy design is a data engineering decision that should involve the analytics engineer who will model the data, not just the marketing team configuring the tag.
**GA4 BigQuery export.** GA4's native BigQuery export delivers raw event tables (events_YYYYMMDD tables with a nested events array) to a BigQuery dataset daily. The data model is deeply nested — event parameters are stored as an array of key-value records within each event row. dbt models that flatten this structure into a usable format are required for efficient downstream analysis.
A standard GA4 flattening pattern:
select
event_date,
event_timestamp,
event_name,
user_pseudo_id,
(select value.string_value from unnest(event_params) where key = 'page_location') as page_url,
(select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
(select value.string_value from unnest(event_params) where key = 'source') as traffic_source
from analytics_XXXXXXXXX.events_YYYYMMDD
For organisations not on GA4 360, the free BigQuery export is limited and may require workarounds. Alternative: use a CDP (Rudderstack, Segment) as the event collection layer, which writes raw events directly to your warehouse independent of GA4.
Ad Platform Data Pipelines
Marketing spend data lives in Google Ads, Meta Ads Manager, LinkedIn Campaign Manager, TikTok Ads, and other platforms. Each exposes an API. Getting this data into a warehouse is the first requirement for any spend analysis.
**Managed connectors** (Fivetran, Airbyte) handle ad platform API connections, incremental extraction, and schema mapping for all major platforms. The connectors handle the API authentication, rate limiting, and data type normalisation. This is the right choice for most teams — the engineering effort to build and maintain native ad API connectors is significant.
Key dimensions per platform:
Google Ads: campaign, ad group, ad, keyword, device, date. Cost data in the account currency. Conversion data (if Google Ads conversion tracking is configured) at campaign or ad level.
Meta (Facebook/Instagram) Ads: campaign, ad set, ad, placement, device, date. Reach, impressions, clicks, spend, and actions (conversions defined in Meta Events Manager).
LinkedIn Ads: campaign, ad, creative, company, seniority, job function, date. B2B-specific dimensions (company size, industry) are LinkedIn's differentiator.
**Data normalisation across platforms.** Each platform uses different column names for equivalent metrics. Impressions in Google Ads is "impressions"; in Meta it may be "impressions" in one table and "reach" in another. Clicks is "clicks" in Google, "link_clicks" in Meta. Build a unified spend model with consistent column names across all platforms as part of the dbt transformation layer.
Attribution Modelling in the Warehouse
Attribution is the assignment of credit to marketing touchpoints in the conversion journey. Last-click attribution (100% of credit to the last touch before conversion) is the default in most analytics tools and the least accurate for multi-touch journeys. In-warehouse attribution allows you to implement the model that best reflects your business.
**Session-level attribution.** Assign conversions to the session in which they occurred, with the traffic source of that session as the attributed channel. This is what GA4 reports by default and what most marketing teams mean when they say "attribution." The data model: join conversion events to their session, join sessions to their traffic source.
Multi-touch attribution (last touch, first touch, linear, time decay, data-driven):
Last touch: 100% credit to the last touchpoint before conversion. Overvalues retargeting and brand search, undervalues upper-funnel channels.
First touch: 100% credit to the first touchpoint. Overvalues awareness channels, undervalues closing channels.
Linear: Equal credit to all touchpoints. Simplest to implement but ignores position and recency.
Time decay: More credit to recent touchpoints. Better than linear for longer sales cycles where recent touches have more influence.
Data-driven: Credit distributed based on statistical analysis of which touchpoint combinations correlate with higher conversion rates. Requires significant conversion volume (typically 3,000+ conversions per 30 days in Google Ads) to be statistically reliable.
**Implementing multi-touch attribution in the warehouse:** Build a user journey model that sequences all touchpoints (from GA4 events, CRM activities, and ad platform impressions where available) with their timestamps. For each conversion, identify the touchpoints in a defined lookback window. Apply the attribution model to distribute credit across those touchpoints.
The lookback window is the primary design decision: 30 days is standard for e-commerce, 90–180 days for B2B with longer sales cycles. A touchpoint that occurred 6 months before conversion should have less influence than one from last week.
Connecting Marketing Spend to Revenue Outcomes
The most valuable marketing analytics is margin-level: cost per revenue dollar, return on ad spend (ROAS), and customer lifetime value by acquisition channel. This requires joining marketing cost data to revenue data.
**Customer identity resolution.** Ad platforms track users by anonymous advertising IDs (Google Click ID, Facebook Click ID). The warehouse needs a user identity layer that maps these IDs to internal customer identifiers. The standard pattern: capture UTM parameters and click IDs in GA4 events, join to CRM records when the user converts and provides their email.
For B2B, the unit of analysis is typically the account, not the individual. Attribution should be at the account level — which campaigns influenced the accounts that converted?
**Cost per customer acquisition (CAC):** Total spend by channel / new customers attributed to that channel. Requires the attribution model to assign first-touch or multi-touch credit to specific channels.
**Revenue by acquisition channel:** Join customer revenue (from CRM or order data) to the channel that acquired them (from the attribution model). This requires a durable customer-to-channel mapping that persists the original acquisition attribution — not just what channel the customer last came from.
**ROAS by campaign:** Revenue attributed to a campaign / spend on that campaign. The attribution model determines revenue attribution; the ad platform connector provides spend. The join key is the campaign identifier, which must be consistent between the attribution model and the ad platform data.
The Marketing Data Model
A production marketing analytics data model includes:
**stg_ga4_events:** Flattened GA4 event data with one row per event.
**int_sessions:** Assembled sessions from GA4 events — session_id, user_pseudo_id, session_start, session_channel, session_source, session_medium.
**int_user_journey:** Ordered sequence of all touchpoints per user, within a defined lookback window of each conversion event.
**fct_conversions:** One row per conversion event, with attributed channel and campaign using the chosen attribution model.
**fct_ad_spend:** Unified spend table across all platforms — campaign, ad group, date, spend, impressions, clicks. Normalised column names.
**fct_marketing_performance:** Joined conversions and spend by campaign and date. Revenue, CAC, ROAS, and conversion rate per campaign.
Marketing analytics in the warehouse provides full control over attribution methodology, unified cross-channel reporting, and the ability to join marketing data to revenue and operational data that lives in the same warehouse. The investment in building and maintaining this stack is significant but provides analytical depth that no SaaS marketing analytics tool can match.
Our data architecture consulting practice designs marketing analytics data models and pipelines — contact us to discuss your marketing analytics requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →