Performance tuning & optimization for Snowflake -> BigQuery
Make BigQuery fast and predictable after migration. We tune queries, table layout, and execution strategy so dashboards refresh on time and scan costs stay stable as volume grows.
- Input
- Snowflake Performance tuning & optimization logic
- Output
- BigQuery equivalent (validated)
- Common pitfalls
- Partitioning after the fact: migrating tables without aligning partitions to query filters.
- Clustering by intuition: clustering keys chosen without evidence from real predicates and join keys.
- Unbounded MERGE: applying MERGE without scoping to the affected partition window.
Why this breaks
Snowflake and BigQuery reward different habits. After migration, teams often keep Snowflake-era query shapes and expect the optimizer to rescue performance. The result is predictable: slower dashboards, higher scan bytes, and unstable costs.
Common post-migration symptoms:
- Queries scan entire tables because partition filters aren’t pushed down
- Heavy joins reshuffle large datasets; BI queries become expensive and slow
- MERGE/upsert jobs scan full targets due to missing pruning boundaries
- Semi-structured transforms (VARIANT->JSON) add expensive casts and repeated extraction
- Concurrency spikes cause slot contention or unpredictable runtimes
Optimization isn’t “nice to have.” It’s the difference between a successful BigQuery cutover and a permanent cost/perf firefight.
How conversion works
- Baseline the top workloads: identify the most expensive and most business-critical queries/pipelines (dashboards, marts, incremental loads).
- Diagnose root causes: scan bytes, join patterns, skew, partition pruning, repeated JSON extraction, and MERGE scopes.
- Tune table layout: partitioning, clustering, and staging boundaries aligned to query access paths.
- Rewrite for pruning and reuse: predicate pushdown-friendly filters, pre-aggregation, materialized views, and de-duplication of expensive transforms.
- Capacity & cost governance: reservations/autoscaling posture, concurrency controls, and cost guardrails.
- Regression gates: performance baselines + thresholds so future changes don’t reintroduce scan blowups.
Supported constructs
Representative tuning levers we apply for Snowflake -> BigQuery workloads.
| Source | Target | Notes |
|---|---|---|
| Snowflake clustering/micro-partition effects | BigQuery partitioning + clustering | Align layout to filter + join access paths to maximize pruning. |
| BI query patterns (Looker/Tableau/PowerBI) | Pre-aggregation + materialized views (where appropriate) | Reduce repeated scans and stabilize refresh SLAs. |
| MERGE/upsert workloads | Partition-scoped staging + MERGE boundaries | Avoid full-target scans by scoping apply windows. |
| VARIANT-heavy transforms | Typed extraction tables + reuse | Extract once, cast once-then join/aggregate on typed columns. |
| Warehouse sizing and concurrency | Reservations/slots + workload management | Predictable performance under peak refresh and batch windows. |
| Ad-hoc expensive queries | Governance: guardrails + cost controls | Prevent scan blowups from new patterns and unmanaged access. |
How workload changes
| Topic | Snowflake | BigQuery |
|---|---|---|
| Primary cost driver | Warehouse credits (compute time) | Bytes scanned + slot time |
| Data layout impact | Micro-partitions and clustering can hide suboptimal SQL | Partitioning/clustering must match access paths |
| Concurrency behavior | Warehouse scaling model | Slots/reservations + concurrency policies |
| Optimization style | Often query-level tweaks and warehouse tuning | Pruning-aware rewrites + materialization + governance |
Examples
Illustrative BigQuery optimization patterns: enforce pruning, extract JSON once, and scope MERGEs. Replace datasets and fields to match your environment.
-- Pruning-friendly pattern: ensure partition filter is present
-- Example: fact table partitioned by DATE(event_ts)
SELECT
country,
SUM(revenue) AS rev
FROM `proj.mart.fact_orders`
WHERE DATE(event_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;Common pitfalls
- Partitioning after the fact: migrating tables without aligning partitions to query filters.
- Clustering by intuition: clustering keys chosen without evidence from real predicates and join keys.
- Unbounded MERGE: applying MERGE without scoping to the affected partition window.
- Repeated JSON extraction: calling JSON_VALUE/JSON_QUERY many times per row instead of extracting once into typed columns.
- Over-materialization: creating many intermediate tables/views without controlling refresh cost.
- Ignoring concurrency: BI refresh spikes overwhelm slots/reservations and create tail latency.
- No regression gates: performance improvements disappear after the next model change.
Validation approach
- Baseline capture: for each top query/pipeline, record runtime, bytes scanned, slot time, and output row counts.
- Plan-level checks: confirm partition pruning and predicate pushdown on representative parameters.
- Before/after evidence: show improvements on runtime + scan bytes; record exceptions and tradeoffs.
- Correctness guardrails: KPI aggregates and golden queries ensure tuning doesn’t change semantics.
- Regression thresholds: define alert thresholds (e.g., +25% bytes scanned or +30% runtime) and enforce in CI or scheduled checks.
- Operational monitors: post-tuning dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.
Migration steps
- 01
Identify top cost and SLA drivers
Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.
- 02
Create baselines and targets
Capture current BigQuery job metrics (runtime, scan bytes, slot time) and define improvement targets. Freeze golden outputs so correctness doesn’t regress.
- 03
Tune layout: partitioning and clustering
Align partition keys to the most common filters and refresh windows. Choose clustering keys based on observed predicates and join keys-not guesses.
- 04
Rewrite for pruning and reuse
Apply pruning-aware SQL rewrites, reduce reshuffles, pre-aggregate where needed, and extract semi-structured fields once into typed tables for reuse.
- 05
Capacity posture and governance
Set reservations/slot strategy (or on-demand posture), tune concurrency, and implement guardrails to prevent scan blowups from new queries.
- 06
Add regression gates
Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.
We identify your highest-cost queries and pipelines, tune pruning and layout, and deliver before/after evidence with regression thresholds-so optimization sticks and costs stay stable.
Get an optimization backlog, tuned table layouts, and performance gates (runtime/bytes/slot thresholds) so future model changes don’t reintroduce slow refreshes or high spend.