Workload

Validation & reconciliation for Teradata -> BigQuery

Turn "it compiles" into a measurable parity contract. We prove correctness for batch and incremental systems with golden queries, KPI diffs, and replayable integrity simulations-then gate cutover with rollback-ready criteria.

At a glance
Input
Teradata Validation & reconciliation logic
Output
BigQuery equivalent (validated)
Common pitfalls
  • Spot-check validation: a few samples miss drift in ties and edge cohorts.
  • No tolerance model: teams argue about “small diffs” because thresholds were never defined.
  • Wrong comparison level: comparing raw rows when the business cares about rollups (or vice versa).
Context

Why this breaks

Teradata migrations fail late when teams validate only syntax and a handful of spot checks. Teradata estates encode decades of implicit behavior: PI/AMP-era locality assumptions, volatile-table workflows, and “good enough” ordering in top-N and window logic. BigQuery will execute translated workloads-but drift appears when correctness rules were never written down or stress-tested.

Common drift drivers in Teradata -> BigQuery:

  • Windowing and TOP ambiguity: QUALIFY/top-N logic without deterministic tie-breakers
  • Type and NULL semantics: CASE/COALESCE branches and join keys cast differently
  • Date/time edge cases: boundary days, truncation, and DATE vs TIMESTAMP intent
  • Intermediate/staging behavior: volatile-table patterns replaced incorrectly, changing apply boundaries
  • Incremental behavior: reruns, restarts, and backfill windows weren’t validated as first-class scenarios

Validation must treat the system as operational and incremental, not a one-time batch.

Approach

How conversion works

  1. Define the parity contract: what must match (tables, KPIs, dashboards), at what granularity, and with what tolerances.
  2. Build validation datasets: golden inputs, edge cohorts (ties, null-heavy segments), and representative windows (including boundary dates).
  3. Run readiness + execution gates: schema/type alignment, dependency readiness, compile/run reliability.
  4. Run layered parity gates: counts/profiles -> KPI diffs -> targeted row-level diffs for flagged cohorts.
  5. Validate incremental integrity where applicable: reruns, restart simulations, backfills, and late-arrival injections.
  6. Gate cutover: pass/fail thresholds, canary strategy, rollback triggers, and post-cutover monitors.

Supported constructs

Representative validation and reconciliation mechanisms we apply in Teradata -> BigQuery migrations.

SourceTargetNotes
Golden queries and reportsGolden query harness + repeatable parameter setsCodifies business sign-off into runnable tests.
Counts and column profilesPartition-level counts + null/min/max/distinct profilesCheap early drift detection before deep diffs.
KPI validationAggregate diffs by key dimensions + tolerance thresholdsAligns validation with business meaning.
Row-level diffsTargeted sampling diffs + edge cohort testsUse deep diffs only where aggregates signal drift.
ETL restartabilityReruns, restart simulations, and backfill windowsValidates control-table semantics and idempotency.
Operational sign-offCanary gates + rollback criteria + monitorsPrevents cutover from becoming a long KPI debate.

How workload changes

TopicTeradataBigQuery
Drift driversPI/AMP-era query idioms and volatile-table workflows hide assumptionsExplicit ordering, casting, and apply boundaries required
Cost of validationResource governance and workload classesBytes scanned + slot time
Incremental behaviorRestartability conventions and control tables commonRerun/backfill behavior must be simulated explicitly
Drift drivers: Validation focuses on making assumptions testable.
Cost of validation: Use layered gates to keep validation economical.
Incremental behavior: Integrity simulations are mandatory gates.

Examples

Illustrative parity and integrity checks in BigQuery. Replace datasets, keys, and KPI definitions to match your migration.

-- Partition/window row counts (BigQuery)
SELECT
  DATE(txn_ts) AS d,
  COUNT(*) AS rows
FROM `proj.mart.fact_sales`
WHERE DATE(txn_ts) BETWEEN @start_date AND @end_date
GROUP BY 1
ORDER BY 1;
Avoid

Common pitfalls

  • Spot-check validation: a few samples miss drift in ties and edge cohorts.
  • No tolerance model: teams argue about “small diffs” because thresholds were never defined.
  • Wrong comparison level: comparing raw rows when the business cares about rollups (or vice versa).
  • Ignoring incremental behavior: parity looks fine on a static snapshot but fails under reruns/backfills.
  • Unstable ordering: top-N/window functions without complete ORDER BY.
  • Cost-blind diffs: exhaustive row-level diffs can be expensive; use layered gates (cheap->deep).
Proof

Validation approach

Gate set (layered)

Gate 0 - Readiness

  • BigQuery datasets, permissions, and target schemas exist
  • Dependent assets deployed (UDFs/routines, reference data, control tables)

Gate 1 - Execution

  • Converted workloads compile and run reliably
  • Deterministic ordering + explicit casts enforced for high-risk patterns

Gate 2 - Structural parity

  • Row counts by partitions/windows
  • Null/min/max/distinct profiles for key columns

Gate 3 - KPI parity

  • KPI aggregates by key dimensions
  • Rankings and top-N validated on edge windows (ties, boundary dates)

Gate 4 - Integrity (incremental systems)

  • Idempotency: rerun same window -> no net change
  • Restart simulation: fail mid-run -> resume -> correct final state
  • Backfill: historical windows replay without drift
  • Late-arrival: inject late corrections -> only expected rows change

Gate 5 - Cutover & monitoring

  • Canary criteria + rollback triggers
  • Post-cutover monitors: latency, scan bytes/slot time, diff sentinels on critical KPIs
Execution

Migration steps

A practical sequence for making validation fast, repeatable, and dispute-proof.
  1. 01

    Define the parity contract

    Decide what must match (tables, dashboards, KPIs), at what granularity, and with what tolerance thresholds. Identify golden queries and business sign-off owners.

  2. 02

    Create validation datasets and edge cohorts

    Select representative time windows and cohorts that trigger edge behavior (ties/top-N, null-heavy segments, boundary dates, skewed keys).

  3. 03

    Implement layered gates

    Start with cheap checks (counts/profiles), then KPI diffs, then deep diffs only where needed. Codify gates into runnable jobs so validation is repeatable.

  4. 04

    Validate incremental integrity

    Run idempotency reruns, restart simulations, backfill windows, and late-arrival injections. These are the scenarios that usually break if not tested.

  5. 05

    Gate cutover and monitor

    Establish canary/rollback criteria and post-cutover monitors for critical KPIs and pipeline health (latency, failures, scan bytes/slot time).

Workload Assessment
Make parity measurable before you cut over

We define your parity contract, build the golden-query set, and implement layered reconciliation gates-including reruns, restarts, and backfill simulations-so KPI drift is caught before production cutover.

Cutover Readiness
Gate cutover with evidence and rollback criteria

Get a validation plan, runnable gates, and sign-off artifacts (diff reports, thresholds, monitors) so Teradata->BigQuery cutover is controlled and dispute-proof.

FAQ

Frequently asked questions

Do we need row-level diffs for everything?+
Usually no. A layered approach is faster and cheaper: start with counts/profiles and KPI aggregates, then do targeted row-level diffs only where aggregates signal drift or for critical entities.
How do you handle small KPI differences?+
We define tolerance thresholds up front (exact vs %/absolute). If differences exceed thresholds, we trace back using dimensional rollups and targeted sampling to isolate the drift source.
What if the system has restartability and backfills?+
Then validation must include simulations: reruns, restart scenarios, backfill windows, and late-arrival injections. These gates prove the migrated system behaves correctly under operational stress.
How does validation tie into cutover?+
We convert gates into cutover criteria: pass/fail thresholds, canary rollout, rollback triggers, and post-cutover monitors. Cutover becomes an evidence-based decision, not a debate.