Workload

Performance tuning & optimization for Redshift → BigQuery

Redshift tuning assumptions (DISTKEY/SORTKEY, VACUUM/ANALYZE, WLM) don’t carry over. We tune BigQuery layout, queries, and capacity so refresh SLAs hold and scan costs stay predictable as volume grows.

At a glance
Input
Redshift Performance tuning & optimization logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • Carrying over DIST/SORT thinking: assuming physical distribution patterns translate; BigQuery requires pruning and layout-to-filter alignment.
  • Partitioning after the fact: migrating tables without aligning partitions to common filters.
  • Clustering by folklore: clustering keys chosen without evidence from predicates/join keys.
Context

Why this breaks

Redshift performance tuning is often encoded in physical design and platform operations: DISTKEY/SORTKEY choices, VACUUM/ANALYZE, and WLM queues. After migration, teams keep Redshift-era query shapes and expect BigQuery to behave similarly—then costs spike and SLAs slip because BigQuery’s cost/perf model is different.

Common post-cutover symptoms:

  • Queries scan entire tables because partition filters aren’t pushed down
  • Joins reshuffle large datasets; BI refresh becomes slow and expensive
  • Incremental jobs and MERGEs scan full targets due to missing pruning boundaries
  • Concurrency spikes cause slot contention and tail latency
  • Spend becomes unpredictable because there are no regression gates or guardrails

Optimization replaces Redshift’s DIST/SORT playbook with BigQuery-native, evidence-driven tuning.

Approach

How conversion works

  1. Baseline the top workloads: identify the most expensive and most business-critical queries/pipelines (dashboards, marts, incremental loads).
  2. Diagnose root causes: scan bytes, join patterns, skew, partition pruning, repeated transforms, and MERGE scopes.
  3. Tune table layout: partitioning and clustering aligned to access paths and refresh windows.
  4. Rewrite for pruning and reuse: predicate pushdown-friendly filters, pre-aggregation, materialized views, and de-duplication of expensive transforms.
  5. Capacity & cost governance: reservations/on-demand posture, concurrency controls, and cost guardrails.
  6. Regression gates: baselines + thresholds so future changes don’t reintroduce scan blowups.

Supported constructs

Representative tuning levers we apply for Redshift → BigQuery workloads.

SourceTargetNotes
DISTKEY/SORTKEY-era performance assumptionsPartitioning + clustering aligned to access pathsReplace physical tuning with pruning-first layout decisions.
WLM queues and concurrency settingsReservations/slots + concurrency policiesStabilize refresh SLAs under peak BI load.
UPSERT jobs and incremental loadsPartition-scoped MERGE and pruning-aware stagingAvoid full-target scans and unpredictable runtime.
Repeated BI scansPre-aggregation + materialized views (where appropriate)Reduce scan bytes and stabilize dashboards.
Vacuum/analyze maintenance habitsLayout + rewrite strategy validated via job metricsUse evidence (bytes/slot time) rather than maintenance folklore.
Ad-hoc expensive queriesGovernance: guardrails + cost controlsPrevent scan blowups from unmanaged access.

How workload changes

TopicRedshiftBigQuery
Primary cost driverCluster resources + WLM behaviorBytes scanned + slot time
Data layout impactDIST/SORT keys can hide suboptimal SQLPartitioning/clustering must match access paths
Concurrency planningWLM queues and limitsSlots/reservations + concurrency policies
Optimization styleOften query tuning + maintenance operationsPruning-aware rewrites + layout + governance
Primary cost driver: Pruning and query shape dominate spend.
Data layout impact: Layout decisions become first-class performance levers.
Concurrency planning: Peak BI refresh needs explicit capacity posture.
Optimization style: Tuning is holistic: SQL + layout + capacity + guardrails.

Examples

Illustrative BigQuery optimization patterns after Redshift migration: enforce pruning, pre-aggregate for BI, and store baselines for regression gates. Replace datasets and fields to match your environment.

-- Pruning-first query shape (fact table partitioned by DATE(event_ts))
SELECT
  country,
  SUM(revenue) AS rev
FROM `proj.mart.fact_orders`
WHERE DATE(event_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;
Avoid

Common pitfalls

  • Carrying over DIST/SORT thinking: assuming physical distribution patterns translate; BigQuery requires pruning and layout-to-filter alignment.
  • Partitioning after the fact: migrating tables without aligning partitions to common filters.
  • Clustering by folklore: clustering keys chosen without evidence from predicates/join keys.
  • Unbounded MERGE: applying MERGE without scoping to affected partitions, causing large scans.
  • Over-materialization: too many intermediates without controlling refresh cost.
  • Ignoring concurrency: BI refresh spikes overwhelm slots/reservations and create tail latency.
  • No regression gates: improvements disappear after the next model change.
Proof

Validation approach

  • Baseline capture: runtime, bytes scanned, slot time, and output row counts for each top query/pipeline.
  • Pruning checks: confirm partition pruning and predicate pushdown on representative parameters.
  • Before/after evidence: demonstrate improvements in runtime and scan bytes; document tradeoffs.
  • Correctness guardrails: golden queries and KPI aggregates ensure tuning doesn’t change semantics.
  • Regression thresholds: define alerts (e.g., +25% bytes scanned or +30% runtime) and enforce via CI or scheduled checks.
  • Operational monitors: dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.
Execution

Migration steps

A sequence that improves performance while protecting semantics.
  1. 01

    Identify top cost and SLA drivers

    Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.

  2. 02

    Create baselines and targets

    Capture current BigQuery job metrics (runtime, scan bytes, slot time) and define improvement targets. Freeze golden outputs so correctness doesn’t regress.

  3. 03

    Tune layout: partitioning and clustering

    Align partition keys to the most common filters and refresh windows. Choose clustering keys based on observed predicates and join keys—not guesses.

  4. 04

    Rewrite for pruning and reuse

    Apply pruning-aware SQL rewrites, reduce reshuffles, pre-aggregate where needed, and scope MERGEs/applies to affected partitions.

  5. 05

    Capacity posture and governance

    Set reservations/on-demand posture, tune concurrency for BI refresh peaks, and implement guardrails to prevent scan blowups from new queries.

  6. 06

    Add regression gates

    Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.

Workload Assessment
Replace DIST/SORT tuning with BigQuery-native performance

We identify your highest-cost migrated workloads, tune pruning and table layout, and deliver before/after evidence with regression thresholds—so performance improves and stays stable.

Optimization Program
Prevent scan blowups with regression gates

Get an optimization backlog, tuned partitioning/clustering, and performance gates (runtime/bytes/slot thresholds) so future releases don’t reintroduce slow dashboards or high spend.

FAQ

Frequently asked questions

Why did costs increase after moving from Redshift to BigQuery?+
Most often because partitioning/clustering wasn’t aligned to filters, and Redshift-era query shapes don’t maximize pruning. We tune layout and rewrite queries to reduce bytes scanned and stabilize spend.
How do you keep optimization from changing results?+
We gate tuning with correctness checks: golden queries, KPI aggregates, and checksum-style diffs. Optimizations only ship when outputs remain within agreed tolerances.
Can you optimize UPSERT/MERGE pipelines too?+
Yes. We scope MERGEs to affected partitions, design staging boundaries, and validate performance with scan bytes/slot time baselines to prevent unpredictable runtime.
Do you cover reservations and concurrency planning?+
Yes. We recommend a capacity posture (on-demand vs reservations), concurrency controls for BI refresh spikes, and monitoring/guardrails so performance stays stable as usage grows.