Workload

Redshift ETL pipelines to Snowflake

Re-home incremental ETL—staging, dedupe, UPSERT patterns, and orchestration—from Redshift into Snowflake with an explicit run contract and validation gates that prevent KPI drift and credit spikes.

At a glance
Input
Redshift ETL / pipeline migration logic
Output
Snowflake equivalent (validated)
Common pitfalls
  • DELETE+INSERT drift: partial failures and retries double-apply or delete wrong rows without idempotency guards.
  • Implicit watermarks: using job runtime or CURRENT_TIMESTAMP instead of persisted high-water marks.
  • Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers causes drift under retries.
Context

Why this breaks

Redshift ETL systems often hide correctness in operational conventions: watermark tables, VACUUM/ANALYZE-driven assumptions, and UPSERT patterns implemented via staging + delete/insert. Snowflake can implement equivalent outcomes—but the run contract must be explicit: windows, keys, deterministic ordering, dedupe rules, and restart semantics.

Common symptoms after cutover:

  • Duplicates or missing updates because watermarks and tie-breakers were implicit
  • DELETE+INSERT UPSERT logic drifts under retries and partial failures
  • Late-arrival behavior changes because reprocessing windows weren’t defined
  • Credit spikes because MERGE/apply touches too much history (full-target scans)
  • Orchestration differences change retries/dependencies, turning failures into silent data issues
Approach

How conversion works

  1. Inventory & classify ETL jobs, schedules, dependencies, and operational controls (watermarks, control tables, retries).
  2. Extract the run contract: business keys, incremental boundaries, dedupe tie-breakers, late-arrival policy, and failure/restart semantics.
  3. Re-home transformations to Snowflake staging (landing → typed staging → dedupe → apply) with bounded MERGE scopes.
  4. Implement restartability: applied-window/batch tracking, idempotency markers, deterministic ordering, and safe retries.
  5. Re-home orchestration (Airflow/dbt/your runner) with explicit DAG dependencies, retries, alerts, and warehouse isolation for batch vs BI.
  6. Gate cutover with evidence: golden outputs + incremental integrity simulations (reruns, backfills, late injections) and rollback-ready criteria.

Supported constructs

Representative Redshift ETL constructs we commonly migrate to Snowflake (exact coverage depends on your estate).

SourceTargetNotes
Staging + DELETE/INSERT UPSERT patternsSnowflake MERGE with bounded apply windowsAvoid full scans and ensure idempotency under retries.
Watermark/control tablesExplicit high-water marks + applied-window trackingRestartable and auditable incremental behavior.
ROW_NUMBER-based dedupe patternsDeterministic dedupe with explicit tie-breakersPrevents nondeterministic drift under retries.
SCD Type-1 / Type-2 logicMERGE + current-flag/end-date patternsBackfills and late updates validated as first-class scenarios.
Copy/load ingestion chainsLanding tables + batch manifestsIngestion made explicit; replayability supported.
Scheduler-driven chainsAirflow/dbt orchestration with warehouse isolationRetries, concurrency, and alerts modeled and monitored.

How workload changes

TopicRedshiftSnowflake
Incremental correctnessOften encoded in staging + delete/insert conventions and job timingExplicit high-water marks + deterministic staging + integrity gates
UpsertsDELETE+INSERT common for UPSERT semanticsMERGE with bounded apply windows
Cost predictabilityCluster utilization and WLM behaviorWarehouse credits + pruning effectiveness
OrchestrationSchedulers + script chainsAirflow/dbt with explicit DAG contracts
Incremental correctness: Correctness becomes auditable and repeatable under retries/backfills.
Upserts: Avoid full-target scans and protect idempotency.
Cost predictability: Bounded applies + isolation keep credit burn stable.
Orchestration: Retries, alerts, and dependencies become measurable.

Examples

Canonical Snowflake incremental apply pattern for Redshift-style pipelines: stage → dedupe deterministically → bounded MERGE + applied-batch tracking. Adjust keys, offsets, and casts to your model.

-- Applied-batch tracking (restartability)
CREATE TABLE IF NOT EXISTS CONTROL.APPLIED_BATCHES (
  job_name STRING NOT NULL,
  batch_id STRING NOT NULL,
  applied_at TIMESTAMP_NTZ NOT NULL,
  PRIMARY KEY (job_name, batch_id)
);
Avoid

Common pitfalls

  • DELETE+INSERT drift: partial failures and retries double-apply or delete wrong rows without idempotency guards.
  • Implicit watermarks: using job runtime or CURRENT_TIMESTAMP instead of persisted high-water marks.
  • Non-deterministic dedupe: ROW_NUMBER without stable tie-breakers causes drift under retries.
  • Full-target MERGE: missing apply boundaries causes large scans and credit spikes.
  • Schema evolution surprises: upstream types widen/change; typed targets break without a drift policy.
  • No warehouse isolation: BI refresh shares warehouse with ETL; tail latency and spend spikes follow.
Proof

Validation approach

  • Execution checks: pipelines run reliably under representative volumes and schedules.
  • Structural parity: window-level row counts and column profiles (null/min/max/distinct) for key tables.
  • KPI parity: aggregates by key dimensions for critical marts and dashboards.
  • Incremental integrity (mandatory):
    • Idempotency: rerun same window/batch → no net change
    • Restart simulation: fail mid-run → resume → correct final state
    • Backfill: historical windows replay without drift
    • Late-arrival: inject late corrections → only expected rows change
    • Dedupe stability: duplicates eliminated consistently under retries
  • Cost/performance gates: bounded MERGE scope verified; credit/runtime thresholds set for top jobs.
  • Operational readiness: retry/alerting tests, canary gates, and rollback criteria defined before cutover.
Execution

Migration steps

A sequence that keeps correctness measurable and cutover controlled.
  1. 01

    Inventory ETL jobs, schedules, and dependencies

    Extract job chains, upstream/downstream dependencies, SLAs, retry policies, and control-table conventions. Identify business-critical marts and consumers.

  2. 02

    Formalize the run contract

    Define load windows/high-water marks, business keys, deterministic ordering/tie-breakers, dedupe rules, late-arrival policy, and restart semantics. Make idempotency explicit.

  3. 03

    Rebuild staging and apply on Snowflake

    Implement landing → typed staging → dedupe → bounded MERGE apply. Define schema evolution policy (widen/quarantine/reject) and explicit DQ checks.

  4. 04

    Re-home orchestration and warehouse posture

    Implement DAGs with retries/alerts and isolate batch warehouses from BI. Add applied-batch tracking and failure handling.

  5. 05

    Run parity and integrity gates

    Golden outputs + KPI aggregates, idempotency reruns, restart simulations, late-data injections, and backfill windows. Cut over only when thresholds pass and rollback criteria are defined.

Workload Assessment
Migrate Redshift ETL with restartability intact

We inventory your Redshift ETL estate, formalize the run contract (windows, restartability, SCD), migrate a representative pipeline end-to-end, and deliver parity evidence with cutover gates.

Migration Acceleration
Cut over pipelines with proof-backed gates

Get an actionable migration plan with restart simulations, incremental integrity tests, reconciliation evidence, and credit baselines—so ETL cutover is controlled and dispute-proof.

FAQ

Frequently asked questions

Is Redshift ETL migration just rewriting scripts?+
No. The critical work is preserving the run contract: windows/watermarks, restartability, dedupe/tie-break rules, and UPSERT semantics under retries. Script translation is only one part.
How do you handle Redshift DELETE+INSERT UPSERT patterns?+
We migrate them into Snowflake MERGE with bounded apply windows and deterministic staging, then validate idempotency so retries don’t double-apply or delete wrong rows.
How do you avoid Snowflake credit spikes for ETL?+
We bound MERGE scope to affected windows, design pruning-aware staging, and isolate batch warehouses from BI. Validation includes credit/runtime baselines and regression thresholds for top jobs.
Can you keep pipelines incremental in Snowflake?+
Yes. We implement explicit high-water marks and late-arrival windows, use bounded applies, and validate incremental integrity with reruns/backfills/late injections.