Performance tuning & optimization for Teradata -> BigQuery
Teradata tuning assumptions (PI/AMP distribution, stats, spool) don’t carry over. We tune BigQuery queries, table layout, and capacity so dashboards hit SLAs and scan costs stay predictable as data grows.
- Input
- Teradata Performance tuning & optimization logic
- Output
- BigQuery equivalent (validated)
- Common pitfalls
- PI/AMP thinking carried over: assuming the same distribution/locality behaviors exist in BigQuery.
- No pruning strategy: partitioning is missing or not aligned to common filters, causing full scans.
- Clustering by folklore: clustering keys chosen without evidence from predicates and join keys.
Why this breaks
Teradata performance tuning is often encoded in physical design and optimizer expectations: Primary Index choices, AMP-local joins, collected stats, spool behavior, and workload management rules. When you migrate the schema “as-is” to BigQuery, queries may still run-but the execution model is different, so performance can collapse or costs can spike.
Common post-cutover symptoms:
- Teradata-era “physical tuning” treated as schema; BigQuery queries scan too much
- Joins that were AMP-local now reshuffle large datasets; runtimes blow up
- Reused volatile tables / intermediate steps become expensive repeated scans
- Heavy BI queries miss SLAs due to concurrency spikes and slot contention
- Batch windows slip because incremental jobs aren’t pruning-aware
Optimization is how you replace Teradata’s PI/AMP playbook with a BigQuery-native, evidence-driven performance posture.
How conversion works
- Baseline the top workloads: identify the most expensive and most business-critical queries (BI dashboards, marts, batch transforms).
- Map Teradata tuning assumptions: where PI/AMP locality, stats, spool, and volatile tables were doing the work.
- Tune table layout in BigQuery: partitioning + clustering aligned to real access paths (filters + join keys).
- Rewrite SQL for pruning and reduced reshuffles: predicate pushdown-friendly filters, join strategy adjustments, pre-aggregation, and incremental scoping.
- Materialize strategically: precomputed aggregates / incremental snapshots where BI patterns repeatedly scan large facts.
- Capacity & governance: reservations/on-demand posture, concurrency controls for BI refresh spikes, and cost guardrails.
- Regression gates: baselines + thresholds so improvements persist and new releases don’t reintroduce scan blowups.
Supported constructs
Representative tuning levers we apply for Teradata -> BigQuery workloads.
| Source | Target | Notes |
|---|---|---|
| Teradata PI/AMP locality assumptions | BigQuery partitioning + clustering | Replace physical-locality tuning with pruning and layout aligned to access paths. |
| Collected stats + optimizer hints | Query rewrites + layout choices validated by job metrics | Use evidence (bytes/slot time) instead of stats-driven expectations. |
| Volatile/intermediate tables | Strategic materializations + incremental staging | Avoid repeated scans; precompute where BI patterns demand it. |
| Large fact-table joins | Pruning-aware join patterns + pre-aggregation | Reduce reshuffles and stabilize runtime under concurrency. |
| Workload management / query classes | Reservations/slots + concurrency policies | Predictable performance for peak BI refresh and batch windows. |
| Spool-sensitive query patterns | Governance: guardrails + cost controls | Prevent runaway scans and long-tail cost/perf regressions. |
How workload changes
| Topic | Teradata | BigQuery |
|---|---|---|
| Primary tuning lever | PI/AMP distribution + collected stats | Partitioning/clustering + pruning-aware SQL |
| Cost driver | System resources / workload class limits | Bytes scanned + slot time |
| Intermediate results | Volatile tables and spool behavior common | Materialize selectively; minimize repeated scans |
| Concurrency planning | Workload management rules and queues | Reservations/slots + concurrency policies |
Examples
Illustrative BigQuery optimization patterns after Teradata migration: enforce pruning, pre-aggregate, and set regression gates. Replace datasets and fields to match your environment.
-- Pruning-first query shape (fact table partitioned by DATE(txn_ts))
SELECT
store_id,
SUM(net_sales) AS net_sales
FROM `proj.mart.fact_sales`
WHERE DATE(txn_ts) BETWEEN @start_date AND @end_date
GROUP BY 1;Common pitfalls
- PI/AMP thinking carried over: assuming the same distribution/locality behaviors exist in BigQuery.
- No pruning strategy: partitioning is missing or not aligned to common filters, causing full scans.
- Clustering by folklore: clustering keys chosen without evidence from predicates and join keys.
- Volatile-table reliance: Teradata intermediate tables become expensive repeated scans without materialization strategy.
- Skew blindness: joins on highly-skewed keys cause disproportionate shuffles and slowdowns.
- Concurrency surprises: BI refresh peaks overwhelm slots/reservations and create tail latency.
- No regression gates: performance fixes disappear after the next model change.
Validation approach
- Baseline capture: for each top query/pipeline, record runtime, bytes scanned, slot time, and output row counts.
- Pruning checks: confirm partition pruning and predicate pushdown on representative parameters and common BI filters.
- Before/after evidence: demonstrate improvements in runtime and scan bytes; document any tradeoffs.
- Correctness guardrails: golden queries and KPI aggregates ensure tuning doesn’t change semantics.
- Regression thresholds: define alert thresholds (e.g., +25% bytes scanned or +30% runtime) and enforce via CI or scheduled checks.
- Operational monitors: post-tuning dashboards for scan bytes, slot utilization, failures, and refresh SLA adherence.
Migration steps
- 01
Identify top cost and SLA drivers
Rank queries and pipelines by bytes scanned, slot time, and business criticality (dashboard SLAs, batch windows). Select a tuning backlog with clear owners.
- 02
Translate Teradata tuning assumptions
Document where PI/AMP locality, stats collection, and volatile tables were doing performance work. Decide the BigQuery-native replacement: pruning, layout, materialization, or rewrites.
- 03
Tune layout: partitioning and clustering
Align partitions to time windows and common BI filters. Choose clustering keys based on observed predicates and join keys to reduce scan and reshuffle.
- 04
Rewrite for pruning and reduced reshuffles
Apply pruning-aware filters, reduce cross joins and broad reshuffles, and pre-aggregate where BI patterns repeatedly scan large facts.
- 05
Capacity posture and governance
Set reservations/on-demand posture, tune concurrency for BI refresh peaks, and implement guardrails to prevent scan blowups from new queries.
- 06
Add regression gates
Codify performance thresholds and alerting so future changes don’t reintroduce high scan bytes or missed SLAs. Monitor post-cutover metrics continuously.
We identify your highest-cost Teradata-migrated queries, tune pruning and table layout, and deliver before/after evidence with regression thresholds-so performance improves and stays stable.
Get an optimization backlog, tuned partitioning/clustering, and performance gates (runtime/bytes/slot thresholds) so future releases don’t reintroduce slow dashboards or high spend.