Validation & reconciliation for Oracle → BigQuery
Turn “it compiles” into a measurable parity contract. We prove correctness for batch and incremental Oracle-fed systems—including watermark/SCN behavior—then gate cutover with evidence and rollback-ready criteria.
- Input
- Oracle Validation & reconciliation logic
- Output
- BigQuery equivalent (validated)
- Common pitfalls
- Validating only the backfill: parity on a static snapshot doesn’t prove correctness under reruns and late updates.
- No durable watermark: using job runtime or ad-hoc queries instead of persisted SCN/high-water marks.
- No ordering tie-breakers: CDC events processed nondeterministically; reruns drift.
Why this breaks
Oracle migrations fail late when teams validate only syntax and a few reports. Oracle systems encode correctness in operational behavior: SCN/watermark rules, change ordering, constraints/triggers, and rerun/backfill conventions. BigQuery can implement equivalent outcomes, but only if these rules are made explicit and validated under retries, late updates, and partial failures.
Common drift drivers in Oracle → BigQuery:
- Watermark/SCN drift: wrong high-water mark selection or non-durable state leads to missed/duplicate changes
- MERGE semantics drift: match keys, casts, and NULL/empty-string intent differ
- Ordering ambiguity: CDC events need deterministic tie-breakers; otherwise reruns choose different winners
- Trigger side effects lost: audits/derived fields/control table updates disappear unless recreated
- SCD drift: end-dating/current-flag logic breaks under backfills and late updates
Validation must treat the system as an incremental, operational workload, not a one-time batch.
How conversion works
- Define the parity contract: what must match (tables, dashboards, KPIs) and what tolerances apply (exact vs threshold).
- Define the incremental contract: watermarks (SCN/timestamp), ordering/tie-breakers, dedupe rule, late-arrival window policy, and restart semantics.
- Build validation datasets: golden inputs, edge cohorts (ties, null-heavy segments), and representative windows (including boundary dates).
- Run readiness + execution gates: schema/type alignment, dependency readiness (routines, DQ checks), and job reliability.
- Run layered parity gates: counts/profiles → KPI diffs → targeted row-level diffs where needed.
- Validate incremental integrity (mandatory for CDC/upserts): idempotency reruns, restart simulations, backfill windows, and late-arrival injections.
- Gate cutover: pass/fail thresholds, canary rollout, rollback triggers, and post-cutover monitors.
Supported constructs
Representative validation and reconciliation mechanisms we apply in Oracle → BigQuery migrations.
| Source | Target | Notes |
|---|---|---|
| Golden queries and reports | Golden query harness + repeatable parameter sets | Codifies business sign-off into runnable tests. |
| Watermarks (SCN/timestamp) | Durable control tables + restart simulations | Prevents missed/duplicate changes under retries. |
| KPI validation | Aggregate diffs by key dimensions + tolerance thresholds | Aligns validation with business meaning. |
| Counts and column profiles | Partition-level counts + null/min/max/distinct profiles | Cheap early drift detection before deep diffs. |
| Incremental behavior | Idempotency + late-arrival + backfill simulations | Proves correctness under operational stress. |
| Operational sign-off | Canary gates + rollback criteria + monitors | Prevents cutover from becoming a long KPI debate. |
How workload changes
| Topic | Oracle | BigQuery |
|---|---|---|
| Correctness state | SCN/watermark logic often embedded in tools and scripts | Durable control tables + explicit replay semantics |
| Data quality enforcement | Constraints/triggers can enforce invariants implicitly | Explicit DQ gates and drift policies |
| Cutover evidence | Often based on limited report checks | Layered gates + rollback triggers |
Examples
Illustrative parity and integrity checks in BigQuery. Replace datasets, keys, and KPI definitions to match your Oracle migration.
-- Window-level row counts
SELECT
DATE(updated_at) AS d,
COUNT(*) AS rows
FROM `proj.mart.fact_orders`
WHERE DATE(updated_at) BETWEEN @start_date AND @end_date
GROUP BY 1
ORDER BY 1;Common pitfalls
- Validating only the backfill: parity on a static snapshot doesn’t prove correctness under reruns and late updates.
- No durable watermark: using job runtime or ad-hoc queries instead of persisted SCN/high-water marks.
- No ordering tie-breakers: CDC events processed nondeterministically; reruns drift.
- Ignoring triggers/constraints: data quality degrades silently when Oracle enforcement disappears.
- No tolerance model: teams argue about diffs because thresholds weren’t defined upfront.
- Cost-blind diffs: exhaustive row-level diffs can be expensive; use layered gates (cheap→deep).
Validation approach
Gate set (layered)
Gate 0 — Readiness
- BigQuery datasets, permissions, and target schemas exist
- Dependent assets deployed (UDFs/procedures, reference data, DQ checks, control tables)
Gate 1 — Execution
- Converted jobs compile and run reliably under representative volumes
- Deterministic ordering + explicit casts enforced
Gate 2 — Structural parity
- Row counts by partition/window
- Null/min/max/distinct profiles for key columns
Gate 3 — KPI parity
- KPI aggregates by key dimensions
- Ranking/top-N parity validated on tie/edge cohorts
Gate 4 — Incremental integrity (mandatory)
- Idempotency: rerun same window → no net change
- Restart simulation: fail mid-run → resume → correct final state
- Backfill: historical windows replay without drift
- Late-arrival: inject late corrections → only expected rows change
- Watermark correctness: SCN/timestamp advances correctly and is durable
Gate 5 — Cutover & monitoring
- Canary criteria + rollback triggers
- Post-cutover monitors: latency, failures, watermark lag, KPI sentinels
Migration steps
- 01
Define parity and incremental contracts
Decide what must match (tables, dashboards, KPIs) and define tolerances. Make watermarks (SCN/timestamp), ordering, and late-arrival rules explicit.
- 02
Create validation datasets and edge cohorts
Select representative windows and cohorts that trigger edge behavior (ties, null-heavy segments, boundary dates) and incremental stress cases (late updates).
- 03
Implement layered gates
Start with cheap checks (counts/profiles), then KPI diffs, then deep diffs only where needed. Add watermark progression and restart simulations for CDC/upsert pipelines.
- 04
Run incremental integrity simulations
Rerun the same window, simulate partial failure and resume, replay backfills, and inject late updates. Verify only expected rows change and watermarks advance correctly.
- 05
Gate cutover and monitor
Establish canary/rollback criteria and post-cutover monitors for KPI sentinels, watermark lag, failures, and latency.
We define parity + incremental contracts, build golden queries, and implement layered reconciliation gates—including watermark/restart simulations—so Oracle→BigQuery cutover is evidence-based.
Get a validation plan, runnable gates, and sign-off artifacts (diff reports, thresholds, watermark evidence, monitors) so Oracle→BigQuery cutover is controlled and dispute-proof.