Workload

Performance tuning & optimization for Snowflake -> BigQuery

Make BigQuery fast and predictable after migration. We tune queries, table layout, and execution strategy so dashboards refresh on time and scan costs stay stable as volume grows.

Book assessment See migration approach

Back to pair page

At a glance

Input: Snowflake Performance tuning & optimization logic
Output: BigQuery equivalent (validated)
Common pitfalls: Partitioning after the fact: migrating tables without aligning partitions to query filters.
Clustering by intuition: clustering keys chosen without evidence from real predicates and join keys.
Unbounded MERGE: applying MERGE without scoping to the affected partition window.

Context

Why this breaks

Snowflake and BigQuery reward different habits. After migration, teams often keep Snowflake-era query shapes and expect the optimizer to rescue performance. The result is predictable: slower dashboards, higher scan bytes, and unstable costs.

Common post-migration symptoms:

Queries scan entire tables because partition filters aren’t pushed down
Heavy joins reshuffle large datasets; BI queries become expensive and slow
MERGE/upsert jobs scan full targets due to missing pruning boundaries
Semi-structured transforms (VARIANT->JSON) add expensive casts and repeated extraction
Concurrency spikes cause slot contention or unpredictable runtimes

Optimization isn’t “nice to have.” It’s the difference between a successful BigQuery cutover and a permanent cost/perf firefight.

Approach

How conversion works

Baseline the top workloads: identify the most expensive and most business-critical queries/pipelines (dashboards, marts, incremental loads).
Diagnose root causes: scan bytes, join patterns, skew, partition pruning, repeated JSON extraction, and MERGE scopes.
Tune table layout: partitioning, clustering, and staging boundaries aligned to query access paths.
Rewrite for pruning and reuse: predicate pushdown-friendly filters, pre-aggregation, materialized views, and de-duplication of expensive transforms.
Capacity & cost governance: reservations/autoscaling posture, concurrency controls, and cost guardrails.
Regression gates: performance baselines + thresholds so future changes don’t reintroduce scan blowups.

Supported constructs

Representative tuning levers we apply for Snowflake -> BigQuery workloads.

Source	Target	Notes
Snowflake clustering/micro-partition effects	BigQuery partitioning + clustering	Align layout to filter + join access paths to maximize pruning.
BI query patterns (Looker/Tableau/PowerBI)	Pre-aggregation + materialized views (where appropriate)	Reduce repeated scans and stabilize refresh SLAs.
MERGE/upsert workloads	Partition-scoped staging + MERGE boundaries	Avoid full-target scans by scoping apply windows.
VARIANT-heavy transforms	Typed extraction tables + reuse	Extract once, cast once-then join/aggregate on typed columns.
Warehouse sizing and concurrency	Reservations/slots + workload management	Predictable performance under peak refresh and batch windows.
Ad-hoc expensive queries	Governance: guardrails + cost controls	Prevent scan blowups from new patterns and unmanaged access.

How workload changes

Topic	Snowflake	BigQuery
Primary cost driver	Warehouse credits (compute time)	Bytes scanned + slot time
Data layout impact	Micro-partitions and clustering can hide suboptimal SQL	Partitioning/clustering must match access paths
Concurrency behavior	Warehouse scaling model	Slots/reservations + concurrency policies
Optimization style	Often query-level tweaks and warehouse tuning	Pruning-aware rewrites + materialization + governance

Primary cost driver: Pruning and query shape dominate spend.

Data layout impact: Layout decisions become first-class performance levers.

Concurrency behavior: Peak BI refresh needs explicit capacity posture.

Optimization style: Tuning is holistic: SQL + layout + capacity + guardrails.

Examples

Illustrative BigQuery optimization patterns: enforce pruning, extract JSON once, and scope MERGEs. Replace datasets and fields to match your environment.

-- Pruning-friendly pattern: ensure partition filter is present
-- Example: fact table partitioned by DATE(event_ts)
SELECT
  country,
  SUM(revenue) AS rev
FROM `proj.mart.fact_orders`
WHERE DATE(event_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;

Avoid

Common pitfalls

Partitioning after the fact: migrating tables without aligning partitions to query filters.
Clustering by intuition: clustering keys chosen without evidence from real predicates and join keys.
Unbounded MERGE: applying MERGE without scoping to the affected partition window.
Repeated JSON extraction: calling JSON_VALUE/JSON_QUERY many times per row instead of extracting once into typed columns.
Over-materialization: creating many intermediate tables/views without controlling refresh cost.
Ignoring concurrency: BI refresh spikes overwhelm slots/reservations and create tail latency.
No regression gates: performance improvements disappear after the next model change.

Proof

Validation approach

Baseline capture: for each top query/pipeline, record runtime, bytes scanned, slot time, and output row counts.
Plan-level checks: confirm partition pruning and predicate pushdown on representative parameters.
Before/after evidence: show improvements on runtime + scan bytes; record exceptions and tradeoffs.
Correctness guardrails: KPI aggregates and golden queries ensure tuning doesn’t change semantics.
Regression thresholds: define alert thresholds (e.g., +25% bytes scanned or +30% runtime) and enforce in CI or scheduled checks.
Operational monitors: post-tuning dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.

Execution

Migration steps

A sequence that improves performance while protecting semantics.

01
Identify top cost and SLA drivers
Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.
02
Create baselines and targets
Capture current BigQuery job metrics (runtime, scan bytes, slot time) and define improvement targets. Freeze golden outputs so correctness doesn’t regress.
03
Tune layout: partitioning and clustering
Align partition keys to the most common filters and refresh windows. Choose clustering keys based on observed predicates and join keys-not guesses.
04
Rewrite for pruning and reuse
Apply pruning-aware SQL rewrites, reduce reshuffles, pre-aggregate where needed, and extract semi-structured fields once into typed tables for reuse.
05
Capacity posture and governance
Set reservations/slot strategy (or on-demand posture), tune concurrency, and implement guardrails to prevent scan blowups from new queries.
06
Add regression gates
Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.

Workload Assessment

Make BigQuery performance predictable after migration

We identify your highest-cost queries and pipelines, tune pruning and layout, and deliver before/after evidence with regression thresholds-so optimization sticks and costs stay stable.

Book assessment See migration approach

Optimization Program

Prevent scan blowups with regression gates

Get an optimization backlog, tuned table layouts, and performance gates (runtime/bytes/slot thresholds) so future model changes don’t reintroduce slow refreshes or high spend.

Book assessment Explore workloads

FAQ

Frequently asked questions

Why did queries get slower after moving to BigQuery?+

Most often because partitioning/clustering wasn’t aligned to filters, and Snowflake-era query shapes don’t maximize pruning in BigQuery. We tune layout and rewrite queries to reduce scan bytes and reshuffles.

How do you keep optimization from changing results?+

We gate tuning with correctness checks: golden queries, KPI aggregates, and checksum-style diffs. Optimizations only ship when outputs remain within agreed tolerances.

Can you optimize MERGE/upsert pipelines too?+

Yes. We scope MERGEs to affected partitions, use staging boundaries, and design apply windows so you avoid full-target scans and unpredictable runtime.

Do you cover reservations and concurrency planning?+

Yes. We recommend a capacity posture (on-demand vs reservations), concurrency controls for BI refresh spikes, and monitoring/guardrails so performance stays stable as usage grows.

Snowflake to BigQuery migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

Validation & reconciliation

Workload

Prove parity with layered gates, incremental integrity simulations, and cutover criteria.

View page

ETL / pipeline migration

Workload

Migrate incremental pipelines to BigQuery with idempotency and late-data correctness.

View page