Workload

Performance tuning & optimization for Redshift → BigQuery

Redshift tuning assumptions (DISTKEY/SORTKEY, VACUUM/ANALYZE, WLM) don’t carry over. We tune BigQuery layout, queries, and capacity so refresh SLAs hold and scan costs stay predictable as volume grows.

Book assessment See migration approach

Back to pair page

At a glance

Input: Redshift Performance tuning & optimization logic
Output: BigQuery equivalent (validated)
Common pitfalls: Carrying over DIST/SORT thinking: assuming physical distribution patterns translate; BigQuery requires pruning and layout-to-filter alignment.
Partitioning after the fact: migrating tables without aligning partitions to common filters.
Clustering by folklore: clustering keys chosen without evidence from predicates/join keys.

Context

Why this breaks

Redshift performance tuning is often encoded in physical design and platform operations: DISTKEY/SORTKEY choices, VACUUM/ANALYZE, and WLM queues. After migration, teams keep Redshift-era query shapes and expect BigQuery to behave similarly—then costs spike and SLAs slip because BigQuery’s cost/perf model is different.

Common post-cutover symptoms:

Queries scan entire tables because partition filters aren’t pushed down
Joins reshuffle large datasets; BI refresh becomes slow and expensive
Incremental jobs and MERGEs scan full targets due to missing pruning boundaries
Concurrency spikes cause slot contention and tail latency
Spend becomes unpredictable because there are no regression gates or guardrails

Optimization replaces Redshift’s DIST/SORT playbook with BigQuery-native, evidence-driven tuning.

Approach

How conversion works

Baseline the top workloads: identify the most expensive and most business-critical queries/pipelines (dashboards, marts, incremental loads).
Diagnose root causes: scan bytes, join patterns, skew, partition pruning, repeated transforms, and MERGE scopes.
Tune table layout: partitioning and clustering aligned to access paths and refresh windows.
Rewrite for pruning and reuse: predicate pushdown-friendly filters, pre-aggregation, materialized views, and de-duplication of expensive transforms.
Capacity & cost governance: reservations/on-demand posture, concurrency controls, and cost guardrails.
Regression gates: baselines + thresholds so future changes don’t reintroduce scan blowups.

Supported constructs

Representative tuning levers we apply for Redshift → BigQuery workloads.

Source	Target	Notes
DISTKEY/SORTKEY-era performance assumptions	Partitioning + clustering aligned to access paths	Replace physical tuning with pruning-first layout decisions.
WLM queues and concurrency settings	Reservations/slots + concurrency policies	Stabilize refresh SLAs under peak BI load.
UPSERT jobs and incremental loads	Partition-scoped MERGE and pruning-aware staging	Avoid full-target scans and unpredictable runtime.
Repeated BI scans	Pre-aggregation + materialized views (where appropriate)	Reduce scan bytes and stabilize dashboards.
Vacuum/analyze maintenance habits	Layout + rewrite strategy validated via job metrics	Use evidence (bytes/slot time) rather than maintenance folklore.
Ad-hoc expensive queries	Governance: guardrails + cost controls	Prevent scan blowups from unmanaged access.

How workload changes

Topic	Redshift	BigQuery
Primary cost driver	Cluster resources + WLM behavior	Bytes scanned + slot time
Data layout impact	DIST/SORT keys can hide suboptimal SQL	Partitioning/clustering must match access paths
Concurrency planning	WLM queues and limits	Slots/reservations + concurrency policies
Optimization style	Often query tuning + maintenance operations	Pruning-aware rewrites + layout + governance

Primary cost driver: Pruning and query shape dominate spend.

Data layout impact: Layout decisions become first-class performance levers.

Concurrency planning: Peak BI refresh needs explicit capacity posture.

Optimization style: Tuning is holistic: SQL + layout + capacity + guardrails.

Examples

Illustrative BigQuery optimization patterns after Redshift migration: enforce pruning, pre-aggregate for BI, and store baselines for regression gates. Replace datasets and fields to match your environment.

-- Pruning-first query shape (fact table partitioned by DATE(event_ts))
SELECT
  country,
  SUM(revenue) AS rev
FROM `proj.mart.fact_orders`
WHERE DATE(event_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;

Avoid

Common pitfalls

Carrying over DIST/SORT thinking: assuming physical distribution patterns translate; BigQuery requires pruning and layout-to-filter alignment.
Partitioning after the fact: migrating tables without aligning partitions to common filters.
Clustering by folklore: clustering keys chosen without evidence from predicates/join keys.
Unbounded MERGE: applying MERGE without scoping to affected partitions, causing large scans.
Over-materialization: too many intermediates without controlling refresh cost.
Ignoring concurrency: BI refresh spikes overwhelm slots/reservations and create tail latency.
No regression gates: improvements disappear after the next model change.

Proof

Validation approach

Baseline capture: runtime, bytes scanned, slot time, and output row counts for each top query/pipeline.
Pruning checks: confirm partition pruning and predicate pushdown on representative parameters.
Before/after evidence: demonstrate improvements in runtime and scan bytes; document tradeoffs.
Correctness guardrails: golden queries and KPI aggregates ensure tuning doesn’t change semantics.
Regression thresholds: define alerts (e.g., +25% bytes scanned or +30% runtime) and enforce via CI or scheduled checks.
Operational monitors: dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.

Execution

Migration steps

A sequence that improves performance while protecting semantics.

01
Identify top cost and SLA drivers
Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.
02
Create baselines and targets
Capture current BigQuery job metrics (runtime, scan bytes, slot time) and define improvement targets. Freeze golden outputs so correctness doesn’t regress.
03
Tune layout: partitioning and clustering
Align partition keys to the most common filters and refresh windows. Choose clustering keys based on observed predicates and join keys—not guesses.
04
Rewrite for pruning and reuse
Apply pruning-aware SQL rewrites, reduce reshuffles, pre-aggregate where needed, and scope MERGEs/applies to affected partitions.
05
Capacity posture and governance
Set reservations/on-demand posture, tune concurrency for BI refresh peaks, and implement guardrails to prevent scan blowups from new queries.
06
Add regression gates
Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.

Workload Assessment

Replace DIST/SORT tuning with BigQuery-native performance

We identify your highest-cost migrated workloads, tune pruning and table layout, and deliver before/after evidence with regression thresholds—so performance improves and stays stable.

Book assessment See migration approach

Optimization Program

Prevent scan blowups with regression gates

Get an optimization backlog, tuned partitioning/clustering, and performance gates (runtime/bytes/slot thresholds) so future releases don’t reintroduce slow dashboards or high spend.

Book assessment Explore workloads

FAQ

Frequently asked questions

Why did costs increase after moving from Redshift to BigQuery?+

Most often because partitioning/clustering wasn’t aligned to filters, and Redshift-era query shapes don’t maximize pruning. We tune layout and rewrite queries to reduce bytes scanned and stabilize spend.

How do you keep optimization from changing results?+

We gate tuning with correctness checks: golden queries, KPI aggregates, and checksum-style diffs. Optimizations only ship when outputs remain within agreed tolerances.

Can you optimize UPSERT/MERGE pipelines too?+

Yes. We scope MERGEs to affected partitions, design staging boundaries, and validate performance with scan bytes/slot time baselines to prevent unpredictable runtime.

Do you cover reservations and concurrency planning?+

Yes. We recommend a capacity posture (on-demand vs reservations), concurrency controls for BI refresh spikes, and monitoring/guardrails so performance stays stable as usage grows.

Redshift to BigQuery migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

ETL / pipeline migration

Workload

Migrate Redshift pipelines to BigQuery with restartability and incremental integrity gates.

View page

SQL / query migration

Workload

Convert Redshift SQL to BigQuery with semantic parity and golden-query validation.

View page