Validation & reconciliation for Redshift → Snowflake
Turn “it runs” into a measurable parity contract. We prove correctness *and* credit posture with golden queries, KPI diffs, and replayable integrity simulations—then gate cutover with rollback-ready criteria.
- Input
- Redshift Validation & reconciliation logic
- Output
- Snowflake equivalent (validated)
- Common pitfalls
- Validating only the backfill: parity on a static snapshot doesn’t prove correctness under retries and late updates.
- No cost gate: credit spikes slip through because bounded apply scope isn’t validated.
- DELETE+INSERT semantics assumed: idempotency breaks under partial failures and reruns.
Why this breaks
Redshift migrations fail late when teams validate only compilation and a few dashboard screenshots. Redshift systems encode correctness in operational conventions: staging + delete/insert UPSERTs, implicit casts, WLM-driven behavior, and control-table semantics for retries and watermarks. Snowflake can implement equivalent outcomes, but drift and credit spikes appear when ordering/tie-breakers, casting/NULL intent, and apply scope aren’t made explicit and validated under retries, backfills, and partial failures.
Common drift drivers in Redshift → Snowflake:
- UPSERT drift: delete/insert patterns behave differently under retries and partial failures
- Implicit casts & NULL semantics: CASE/COALESCE and join keys change match behavior
- Window/top-N ambiguity: missing tie-breakers changes winners under parallelism
- Late-arrival behavior: reprocessing conventions not recreated → late updates ignored or double-counted
- Cost spikes: unbounded MERGE/apply scans full targets and burns credits
Validation must treat this as an incremental, operational workload and include credit posture as a first-class gate.
How conversion works
- Define the parity contract: what must match (tables, dashboards, KPIs) and tolerances (exact vs threshold).
- Define the incremental contract: windows/watermarks, ordering/tie-breakers, dedupe rule, late-arrival policy, and restart semantics.
- Define the cost contract: which workloads must stay bounded and what credit/runtime thresholds are acceptable.
- Build validation datasets: golden inputs, edge cohorts (ties, null-heavy segments), and representative windows (including boundary dates).
- Run layered parity gates: counts/profiles → KPI diffs → targeted row-level diffs where needed.
- Validate operational integrity: idempotency reruns, restart simulations, late-arrival injections, and backfill replays.
- Gate cutover: pass/fail thresholds, canary rollout, rollback triggers, and post-cutover monitors.
Supported constructs
Representative validation and reconciliation mechanisms we apply in Redshift → Snowflake migrations.
| Source | Target | Notes |
|---|---|---|
| Golden dashboards/queries | Golden query harness + repeatable parameter sets | Codifies business sign-off into runnable tests. |
| UPSERT behavior (DELETE+INSERT) | MERGE parity contract + idempotency simulations | Proves behavior under retries and partial failures. |
| Counts and profiles | Window-level counts + null/min/max/distinct profiles | Cheap early drift detection before deep diffs. |
| KPI validation | Aggregate diffs by key dimensions + tolerance thresholds | Aligns validation with business meaning. |
| Cost posture | Credit/runtime baselines + regression thresholds | Treat credit stability as part of cutover readiness. |
| Operational sign-off | Canary gates + rollback criteria + monitors | Makes cutover dispute-proof. |
How workload changes
| Topic | Redshift | Snowflake |
|---|---|---|
| Correctness hiding places | ETL conventions (delete/insert) + job structure | Explicit contracts + replayable gates |
| Cost model | Cluster utilization and WLM | Warehouse credits + bounded scans |
| Cutover evidence | Often based on limited report checks | Layered gates + rollback triggers |
Examples
Illustrative parity and integrity checks in Snowflake. Replace schemas, keys, and KPI definitions to match your migration.
-- Window-level row counts
SELECT
TO_DATE(updated_at) AS d,
COUNT(*) AS rows
FROM MART.FACT_ORDERS
WHERE TO_DATE(updated_at) BETWEEN :start_d AND :end_d
GROUP BY 1
ORDER BY 1;Common pitfalls
- Validating only the backfill: parity on a static snapshot doesn’t prove correctness under retries and late updates.
- No cost gate: credit spikes slip through because bounded apply scope isn’t validated.
- DELETE+INSERT semantics assumed: idempotency breaks under partial failures and reruns.
- Unstable ordering: dedupe/top-N lacks tie-breakers; reruns drift.
- No tolerance model: teams argue about diffs because thresholds weren’t defined upfront.
- Cost-blind deep diffs: exhaustive row-level diffs can be expensive; use layered gates (cheap→deep).
Validation approach
Gate set (layered)
Gate 0 — Readiness
- Databases/schemas/roles exist and are permissioned
- Dependent assets deployed (UDFs/procs, reference data, control tables)
Gate 1 — Execution
- Converted jobs compile and run reliably
- Deterministic ordering + explicit casts enforced
Gate 2 — Structural parity
- Row counts by window/partition
- Null/min/max/distinct profiles for key columns
Gate 3 — KPI parity
- KPI aggregates by key dimensions
- Rankings/top-N validated on tie/edge cohorts
Gate 4 — Cost posture (mandatory)
- Bounded apply scope verified on representative windows
- Credit burn and runtime remain within agreed thresholds
- Regression alerts defined for credit spikes
Gate 5 — Operational integrity (mandatory)
- Idempotency: rerun same window/batch → no net change
- Restart simulation: fail mid-run → resume → correct final state
- Backfill: historical windows replay without drift
- Late-arrival: inject late corrections → only expected rows change
- Dedupe stability: duplicates eliminated consistently under retries
Gate 6 — Cutover & monitoring
- Canary criteria + rollback triggers
- Post-cutover monitors: latency, failures, warehouse credits, KPI sentinels
Migration steps
- 01
Define parity, incremental, and cost contracts
Decide what must match (tables, dashboards, KPIs) and define tolerances. Make upsert semantics, windows, and late-data policy explicit. Set credit/runtime thresholds for top workloads.
- 02
Create validation datasets and edge cohorts
Select representative windows and cohorts that trigger edge behavior (ties, null-heavy segments, boundary dates) and operational stress cases (reruns/backfills/late updates).
- 03
Implement layered gates
Start with cheap checks (counts/profiles), then KPI diffs, then deep diffs only where needed. Add credit baselines and regression thresholds to prevent spikes.
- 04
Run operational integrity simulations
Rerun the same window, simulate partial failure and resume, replay backfills, and inject late updates. Verify only expected rows change and dedupe is stable.
- 05
Gate cutover and monitor
Establish canary/rollback criteria and post-cutover monitors for KPIs, credit burn, latency, and failures.
We define parity + upsert + cost contracts, build golden queries, and implement layered reconciliation gates—so Redshift→Snowflake cutover is gated by evidence and credit stability.
Get a validation plan, runnable gates, and sign-off artifacts (diff reports, thresholds, credit baselines, monitors) so Redshift→Snowflake cutover is controlled and dispute-proof.