Workload

Redshift ETL pipelines to Snowflake

Re-home incremental ETL—staging, dedupe, UPSERT patterns, and orchestration—from Redshift into Snowflake with an explicit run contract and validation gates that prevent KPI drift and credit spikes.

Book assessment See migration approach

Back to pair page

At a glance

Input: Redshift ETL / pipeline migration logic
Output: Snowflake equivalent (validated)
Common pitfalls: DELETE+INSERT drift: partial failures and retries double-apply or delete wrong rows without idempotency guards.
Implicit watermarks: using job runtime or CURRENT_TIMESTAMP instead of persisted high-water marks.
Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers causes drift under retries.

Context

Why this breaks

Redshift ETL systems often hide correctness in operational conventions: watermark tables, VACUUM/ANALYZE-driven assumptions, and UPSERT patterns implemented via staging + delete/insert. Snowflake can implement equivalent outcomes—but the run contract must be explicit: windows, keys, deterministic ordering, dedupe rules, and restart semantics.

Common symptoms after cutover:

Duplicates or missing updates because watermarks and tie-breakers were implicit
DELETE+INSERT UPSERT logic drifts under retries and partial failures
Late-arrival behavior changes because reprocessing windows weren’t defined
Credit spikes because MERGE/apply touches too much history (full-target scans)
Orchestration differences change retries/dependencies, turning failures into silent data issues

Approach

How conversion works

Inventory & classify ETL jobs, schedules, dependencies, and operational controls (watermarks, control tables, retries).
Extract the run contract: business keys, incremental boundaries, dedupe tie-breakers, late-arrival policy, and failure/restart semantics.
Re-home transformations to Snowflake staging (landing → typed staging → dedupe → apply) with bounded MERGE scopes.
Implement restartability: applied-window/batch tracking, idempotency markers, deterministic ordering, and safe retries.
Re-home orchestration (Airflow/dbt/your runner) with explicit DAG dependencies, retries, alerts, and warehouse isolation for batch vs BI.
Gate cutover with evidence: golden outputs + incremental integrity simulations (reruns, backfills, late injections) and rollback-ready criteria.

Supported constructs

Representative Redshift ETL constructs we commonly migrate to Snowflake (exact coverage depends on your estate).

Source	Target	Notes
Staging + DELETE/INSERT UPSERT patterns	Snowflake MERGE with bounded apply windows	Avoid full scans and ensure idempotency under retries.
Watermark/control tables	Explicit high-water marks + applied-window tracking	Restartable and auditable incremental behavior.
ROW_NUMBER-based dedupe patterns	Deterministic dedupe with explicit tie-breakers	Prevents nondeterministic drift under retries.
SCD Type-1 / Type-2 logic	MERGE + current-flag/end-date patterns	Backfills and late updates validated as first-class scenarios.
Copy/load ingestion chains	Landing tables + batch manifests	Ingestion made explicit; replayability supported.
Scheduler-driven chains	Airflow/dbt orchestration with warehouse isolation	Retries, concurrency, and alerts modeled and monitored.

How workload changes

Topic	Redshift	Snowflake
Incremental correctness	Often encoded in staging + delete/insert conventions and job timing	Explicit high-water marks + deterministic staging + integrity gates
Upserts	DELETE+INSERT common for UPSERT semantics	MERGE with bounded apply windows
Cost predictability	Cluster utilization and WLM behavior	Warehouse credits + pruning effectiveness
Orchestration	Schedulers + script chains	Airflow/dbt with explicit DAG contracts

Incremental correctness: Correctness becomes auditable and repeatable under retries/backfills.

Upserts: Avoid full-target scans and protect idempotency.

Cost predictability: Bounded applies + isolation keep credit burn stable.

Orchestration: Retries, alerts, and dependencies become measurable.

Examples

Canonical Snowflake incremental apply pattern for Redshift-style pipelines: stage → dedupe deterministically → bounded MERGE + applied-batch tracking. Adjust keys, offsets, and casts to your model.

-- Applied-batch tracking (restartability)
CREATE TABLE IF NOT EXISTS CONTROL.APPLIED_BATCHES (
  job_name STRING NOT NULL,
  batch_id STRING NOT NULL,
  applied_at TIMESTAMP_NTZ NOT NULL,
  PRIMARY KEY (job_name, batch_id)
);

Avoid

Common pitfalls

DELETE+INSERT drift: partial failures and retries double-apply or delete wrong rows without idempotency guards.
Implicit watermarks: using job runtime or CURRENT_TIMESTAMP instead of persisted high-water marks.
Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers causes drift under retries.
Full-target MERGE: missing apply boundaries causes large scans and credit spikes.
Schema evolution surprises: upstream types widen/change; typed targets break without a drift policy.
No warehouse isolation: BI refresh shares warehouse with ETL; tail latency and spend spikes follow.

Proof

Validation approach

Execution checks: pipelines run reliably under representative volumes and schedules.
Structural parity: window-level row counts and column profiles (null/min/max/distinct) for key tables.
KPI parity: aggregates by key dimensions for critical marts and dashboards.
Incremental integrity (mandatory):
- Idempotency: rerun same window/batch → no net change
- Restart simulation: fail mid-run → resume → correct final state
- Backfill: historical windows replay without drift
- Late-arrival: inject late corrections → only expected rows change
- Dedupe stability: duplicates eliminated consistently under retries
Cost/performance gates: bounded MERGE scope verified; credit/runtime thresholds set for top jobs.
Operational readiness: retry/alerting tests, canary gates, and rollback criteria defined before cutover.

Execution

Migration steps

A sequence that keeps correctness measurable and cutover controlled.

01
Inventory ETL jobs, schedules, and dependencies
Extract job chains, upstream/downstream dependencies, SLAs, retry policies, and control-table conventions. Identify business-critical marts and consumers.
02
Formalize the run contract
Define load windows/high-water marks, business keys, deterministic ordering/tie-breakers, dedupe rules, late-arrival policy, and restart semantics. Make idempotency explicit.
03
Rebuild staging and apply on Snowflake
Implement landing → typed staging → dedupe → bounded MERGE apply. Define schema evolution policy (widen/quarantine/reject) and explicit DQ checks.
04
Re-home orchestration and warehouse posture
Implement DAGs with retries/alerts and isolate batch warehouses from BI. Add applied-batch tracking and failure handling.
05
Run parity and integrity gates
Golden outputs + KPI aggregates, idempotency reruns, restart simulations, late-data injections, and backfill windows. Cut over only when thresholds pass and rollback criteria are defined.

Workload Assessment

Migrate Redshift ETL with restartability intact

We inventory your Redshift ETL estate, formalize the run contract (windows, restartability, SCD), migrate a representative pipeline end-to-end, and deliver parity evidence with cutover gates.

Book assessment See migration approach

Migration Acceleration

Cut over pipelines with proof-backed gates

Get an actionable migration plan with restart simulations, incremental integrity tests, reconciliation evidence, and credit baselines—so ETL cutover is controlled and dispute-proof.

Book assessment Explore workloads

FAQ

Frequently asked questions

Is Redshift ETL migration just rewriting scripts?+

No. The critical work is preserving the run contract: windows/watermarks, restartability, dedupe/tie-break rules, and UPSERT semantics under retries. Script translation is only one part.

How do you handle Redshift DELETE+INSERT UPSERT patterns?+

We migrate them into Snowflake MERGE with bounded apply windows and deterministic staging, then validate idempotency so retries don’t double-apply or delete wrong rows.

How do you avoid Snowflake credit spikes for ETL?+

We bound MERGE scope to affected windows, design pruning-aware staging, and isolate batch warehouses from BI. Validation includes credit/runtime baselines and regression thresholds for top jobs.

Can you keep pipelines incremental in Snowflake?+

Yes. We implement explicit high-water marks and late-arrival windows, use bounded applies, and validate incremental integrity with reruns/backfills/late injections.

Redshift to Snowflake migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

Validation & reconciliation

Method

How we define parity contracts, thresholds, and evidence for sign-off.

Learn more

Cost & performance governance

Method

Keep credit burn predictable and catch regressions early.

Learn more