BlogAnalytics

SQL Joins: INNER, LEFT, RIGHT, FULL OUTER, and CROSS Explained

Eric Chen
Eric Chen
BI Solutions Architect
·January 19, 202810 min read

SQL joins combine rows from two or more tables based on a related column. This guide explains every join type — INNER, LEFT, RIGHT, FULL OUTER, CROSS, and SELF — with diagrams, concrete examples, and the analytical use cases where each join type is appropriate. Covers common join mistakes and performance implications.

What SQL Joins Do

A SQL join combines rows from two or more tables based on a related column — the join condition. Joins are the foundation of relational database queries: data is stored in separate tables (orders, customers, products) and assembled at query time through joins.

Understanding join types — and when each is appropriate — is fundamental to writing correct SQL for analytics.

INNER JOIN

Returns only rows that have matching values in both tables. Non-matching rows from both tables are excluded.

SELECT

o.order_id,

o.order_date,

c.customer_name

FROM orders o

INNER JOIN customers c ON o.customer_id = c.customer_id

Result: Only orders with a matching customer record. If an order has a customer_id that does not exist in the customers table, that order is excluded from the result.

**When to use**: When you need matching records from both sides. Most fact-to-dimension joins in analytical queries are INNER JOINs when referential integrity is maintained — every fact row has a matching dimension row.

**When not to use**: When outer rows (non-matching rows) would contain useful information. An INNER JOIN between orders and marketing_campaigns would exclude all orders that came from no campaign — which is valid data, not missing data.

LEFT JOIN (LEFT OUTER JOIN)

Returns all rows from the left table and matching rows from the right table. Rows in the left table with no match in the right table appear with NULL values in the right table columns.

SELECT

c.customer_id,

c.customer_name,

o.order_id,

o.order_date

FROM customers c

LEFT JOIN orders o ON c.customer_id = o.customer_id

Result: All customers, including customers who have never placed an order (order columns are NULL for those rows).

**When to use**: When you want all records from the primary table plus enrichment from a secondary table. The most common join type in analytics:

- All users with their (optional) last login date

- All accounts with their (optional) support ticket count

- All products with their (optional) last order date

**Identifying unmatched rows**: Filter for NULL in the right table's primary key column to find rows with no match:

SELECT c.customer_id, c.customer_name

FROM customers c

LEFT JOIN orders o ON c.customer_id = o.customer_id

WHERE o.order_id IS NULL

This returns customers who have never placed an order.

RIGHT JOIN (RIGHT OUTER JOIN)

The mirror of LEFT JOIN — returns all rows from the right table and matching rows from the left. Rarely used in practice because the same result is achievable by reversing the table order in a LEFT JOIN, which is more readable.

SELECT o.order_id, c.customer_name

FROM orders o

RIGHT JOIN customers c ON o.customer_id = c.customer_id

Equivalent to: SELECT o.order_id, c.customer_name FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id

**When to use**: Effectively never — rewrite as a LEFT JOIN with tables reversed for clarity.

FULL OUTER JOIN

Returns all rows from both tables. Rows from the left table with no match in the right appear with NULL for right table columns; rows from the right table with no match in the left appear with NULL for left table columns.

SELECT

a.date,

a.revenue,

b.cost

FROM revenue_daily a

FULL OUTER JOIN cost_daily b ON a.date = b.date

Result: All dates from both tables — including dates with revenue but no cost (right side NULL), and dates with cost but no revenue (left side NULL).

**When to use**: Reconciling two datasets where either may have records the other does not. Comparing two pipeline outputs for differences. Identifying dates present in one table but missing from another.

Note: Not all databases support FULL OUTER JOIN (MySQL requires a UNION of LEFT and RIGHT JOINs as a workaround).

CROSS JOIN

Returns the Cartesian product — every combination of every row from both tables. If table A has 100 rows and table B has 10 rows, the result has 1,000 rows.

SELECT

products.product_id,

date_spine.date

FROM products

CROSS JOIN date_spine

WHERE date_spine.date >= '2024-01-01'

**When to use**: Generating date spines — cross joining a products table with a date dimension creates one row per product per date, which you can then LEFT JOIN activity against to produce a complete series including days with no activity. Generating all combinations of categories or segments for analysis.

**When not to use by accident**: Forgetting the ON condition in a regular JOIN accidentally produces a CROSS JOIN. A CROSS JOIN on two large tables produces a result set large enough to crash a query. Always verify you have a join condition.

SELF JOIN

A table joined to itself — typically using different aliases for the two references. Used when a table has a hierarchical or sequential relationship within itself.

**Manager-employee hierarchy**:

SELECT

e.employee_name,

m.employee_name AS manager_name

FROM employees e

LEFT JOIN employees m ON e.manager_id = m.employee_id

**Finding consecutive events**:

SELECT

a.session_id,

a.event_type AS event_1,

b.event_type AS event_2,

DATEDIFF('second', a.event_timestamp, b.event_timestamp) AS seconds_between

FROM events a

JOIN events b ON a.session_id = b.session_id

AND b.event_number = a.event_number + 1

Join Conditions Beyond Equality

Joins can use inequality conditions, not just equality:

**Range join** (event lookup by date range):

SELECT o.order_id, p.promotion_name

FROM orders o

JOIN promotions p

ON o.order_date BETWEEN p.start_date AND p.end_date

**Non-equi join** (finding records in different tiers):

SELECT t.transaction_id, r.rate_name

FROM transactions t

JOIN tax_rates r

ON t.amount >= r.min_amount AND t.amount < r.max_amount

Non-equi joins can be expensive — no hash join optimisation for inequality conditions; verify performance on large tables.

Common Join Mistakes

**Fanout**: Joining to a table where multiple rows match one row in the primary table inflates COUNT and SUM metrics. Example: joining orders to promotions where one order applied multiple promotions — each order appears multiple times, inflating order count and revenue.

Fix: aggregate first, then join. Or use COUNT(DISTINCT order_id) instead of COUNT(*) when fanout is possible.

**Missing records misdiagnosed as bad data**: If a LEFT JOIN produces unexpected NULLs, the cause is often legitimate missing data, not a data quality issue. Investigate whether the NULLs represent meaningful absence before assuming a bug.

**Ambiguous column names**: When joining tables with columns of the same name, always qualify with the table alias (o.customer_id not customer_id). Unqualified column names produce ambiguity errors or silently return the wrong column.

**INNER JOIN data loss**: Using INNER JOIN when LEFT JOIN was intended silently excludes non-matching rows. In analytical queries, this often appears as an undercount — metrics are lower than expected because some records were dropped.

Our BI strategy and data architecture practices help teams build SQL-based analytics foundations — contact us to discuss your analytics engineering requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →