How BigQuery ML enables training and deploying machine learning models using SQL directly in BigQuery — without exporting data, without Python infrastructure, and with the governance and access controls of your existing data warehouse.
BigQuery ML (BQML) enables machine learning model training and inference directly within BigQuery using SQL. Rather than exporting data to a Python environment, building a training pipeline, managing infrastructure, and deploying a serving endpoint, you define, train, and apply models with CREATE MODEL and ML.PREDICT SQL statements against data that is already in BigQuery.
For data teams that live in SQL and want to apply ML to their analytical data without building a separate ML infrastructure, BQML provides a practical path. For pure ML engineers building complex models, it is not a replacement for TensorFlow or PyTorch. The distinction matters for choosing when BQML is the right tool.
What BigQuery ML Provides
**Model training in SQL.** BigQuery ML trains models by running SQL against data in your BigQuery tables. The CREATE MODEL statement specifies the model type, the training data (a SELECT query), and model options (regularisation, learning rate, etc.). BigQuery handles the training infrastructure automatically.
**Supported model types.** BQML supports a range of model types:
- Linear regression and logistic regression (Google's fast, in-house implementation)
- K-means clustering for segmentation
- Matrix factorisation for recommendation systems
- Time series forecasting (ARIMA_PLUS with automatic seasonality detection)
- Boosted trees (XGBoost)
- Deep Neural Networks (DNN)
- Imported TensorFlow/Keras models
- Remote models: calling Vertex AI models or generative AI models (Gemini, PaLM 2) via SQL
**Model evaluation in SQL.** ML.EVALUATE runs the model against a held-out evaluation dataset and returns standard metrics — RMSE, MAE, accuracy, precision, recall, AUC — as a SQL query result.
**Prediction in SQL.** ML.PREDICT takes a trained model and a SELECT query as input and returns predictions alongside the input features. Predictions can be joined back to analytical tables, used in dashboards, or used as inputs to further transformations.
**Feature preprocessing.** BQML includes built-in feature transformation functions that run during training and are automatically applied during prediction: TRANSFORM clause for numerical normalisation, one-hot encoding, bucketing, and embedding lookup.
Practical Use Cases
**Customer churn prediction.** Build a logistic regression or boosted trees model that predicts which customers are likely to churn in the next 90 days, based on engagement metrics, purchase history, and support interactions — all of which are already in BigQuery. Score the full customer base weekly with ML.PREDICT and surface at-risk customers in Looker or Tableau dashboards.
**Sales forecasting.** ARIMA_PLUS provides time series forecasting with automatic trend and seasonality detection. Creating a sales forecast model against historical order data requires a single CREATE MODEL statement. Forecasts are updated by re-running ML.PREDICT against recent data.
**Customer segmentation.** K-means clustering on customer behavioural features (purchase frequency, average order value, product category mix) produces customer segments without leaving BigQuery. The segments can be written back to a BigQuery table and joined to other analytical queries.
**Recommendation systems.** Matrix factorisation models learn user-item interaction patterns from implicit feedback (views, purchases, clicks) to generate personalised recommendations. For B2B analytics platforms, this can drive "related reports" or "customers like you also bought" features.
**Anomaly detection on metrics.** Time series models can flag anomalies in operational metrics — unusual spikes in error rates, unexpected drops in transaction volume — using ML.DETECT_ANOMALIES.
**LLM inference at scale.** Remote models allow calling Vertex AI's Gemini models from SQL. Combined with BigQuery's scale, this enables applying LLM inference across millions of rows — sentiment analysis, text classification, summarisation — as SQL queries. This overlaps with Snowflake Cortex's value proposition for organisations on Google Cloud.
Creating and Training a Model
A logistic regression model for churn prediction:
CREATE OR REPLACE MODEL dataset.churn_model
OPTIONS(
model_type = 'LOGISTIC_REG',
auto_class_weights = TRUE,
input_label_cols = ['churned']
) AS
SELECT
customer_id,
days_since_last_order,
total_orders_last_90_days,
average_order_value,
support_tickets_last_90_days,
churned
FROM dataset.customer_features
WHERE partition_date BETWEEN '2024-01-01' AND '2024-12-31';
Training runs asynchronously in BigQuery's infrastructure. Training time depends on dataset size and model complexity — logistic regression trains in minutes; DNN models on large datasets may take hours.
After training, evaluate the model:
SELECT * FROM ML.EVALUATE(MODEL dataset.churn_model)
And predict against the current customer base:
SELECT
c.customer_id,
c.customer_name,
p.predicted_churned_probs[OFFSET(0)].prob AS churn_probability
FROM ML.PREDICT(MODEL dataset.churn_model,
(SELECT * FROM dataset.customer_features WHERE partition_date = CURRENT_DATE)
) p
JOIN dataset.customers c ON p.customer_id = c.customer_id
ORDER BY churn_probability DESC;
BQML vs Vertex AI
BQML and Vertex AI serve different positions in the ML workflow:
BQML is for:
- Standard ML tasks (regression, classification, clustering, time series) where the model type is well-matched to BQML's supported types
- Teams who live in SQL and want to avoid Python infrastructure
- Applying trained models to large BigQuery datasets efficiently
- Rapid prototyping and proof of concept
Vertex AI is for:
- Custom model architectures not supported in BQML
- PyTorch or TensorFlow models requiring GPU training
- Advanced MLOps (model monitoring, A/B testing, deployment infrastructure)
- Models that will be served as real-time endpoints rather than batch predictions
For analytical teams doing batch ML against warehouse data, BQML is often the right starting point. As ML requirements mature and exceed BQML's capabilities, Vertex AI is the natural extension — and the two are integrated: Vertex AI models can be imported into BQML as remote models, and BQML models can be exported to Vertex AI for serving.
**Cost considerations.** BQML training costs are billed per byte of training data processed, at rates higher than standard BigQuery query pricing. Evaluate training cost against the alternative (Vertex AI training on smaller exported datasets) for large models.
For data teams on Google Cloud evaluating ML within their analytics stack, our data architecture consulting and cloud engineering practices cover GCP data platform design — contact us to discuss your requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →