Workload

Teradata ETL pipelines to Snowflake

Re-home incremental ETL—staging, dedupe, APPLY/UPSERT patterns, and orchestration—from Teradata into Snowflake with an explicit run contract and validation gates that prevent KPI drift and credit spikes.

Book assessment See migration approach

Back to pair page

At a glance

Input: Teradata ETL / pipeline migration logic
Output: Snowflake equivalent (validated)
Common pitfalls: Implicit casts: Teradata type coercion differs; Snowflake needs explicit casts for stable outcomes.
NULL semantics drift: match predicates and join keys behave differently unless explicit.
Non-deterministic dedupe: missing tie-breakers causes drift under retries.

Context

Why this breaks

Teradata ETL systems often encode correctness in platform conventions: fast-load utilities, volatile/staging tables, and MERGE/upsert logic that assumes Teradata-specific type coercion and NULL semantics. Many estates also rely on control tables and restart steps that were never explicitly tested. Snowflake can implement equivalent outcomes—but only if the pipeline run contract is made explicit: windows/watermarks, keys, deterministic ordering, dedupe rules, late-arrival policy, and restartability under retries.

Common symptoms after cutover:

Duplicates or missing updates because watermarks and tie-breakers were implicit
MERGE behavior changes because match keys/casts/predicates differ subtly
Late-arrival behavior changes because reprocessing windows weren’t defined
Credit spikes because MERGE/apply touches too much history (full-target scans)
Orchestration differences change retries/dependencies, turning failures into silent data issues

Approach

How conversion works

Inventory & classify ETL jobs, schedules, dependencies, and operational controls (watermarks, control tables, retries).
Extract the run contract: business keys, incremental boundaries, deterministic ordering/tie-breakers, dedupe rule, late-arrival policy, and restart semantics.
Re-home ingestion and staging: Snowflake landing tables, typed staging, and standardized audit columns (batch_id, ingested_at, source offsets).
Rebuild transforms using Snowflake-native patterns (stage → normalize/cast → dedupe → apply) with bounded MERGE scopes and explicit delete semantics where needed.
Re-home orchestration (Airflow/dbt/your runner) with explicit DAG contracts, retries, alerts, and warehouse isolation for batch vs BI.
Gate cutover with evidence: golden outputs + integrity simulations (reruns, backfills, late injections) and rollback-ready criteria.

Supported constructs

Representative Teradata ETL constructs we commonly migrate to Snowflake (exact coverage depends on your estate).

Source	Target	Notes
Teradata FastLoad/MultiLoad-style staging	Snowflake landing tables + batch manifests	Replayable ingestion boundaries with audit columns.
MERGE/upsert pipelines	Snowflake MERGE with bounded apply windows	Match keys, casts, and update predicates made explicit and testable.
Watermark/control tables	Explicit high-water marks + applied-window/batch tracking	Restartable and auditable incremental behavior.
ROW_NUMBER-based dedupe	Deterministic dedupe with explicit tie-breakers	Prevents nondeterministic drift under retries.
SCD Type-1 / Type-2 logic	MERGE + current-flag/end-date patterns	Backfills and late updates validated as first-class scenarios.
Scheduler-driven chains	Airflow/dbt orchestration with warehouse isolation	Retries, concurrency, and alerts modeled and monitored.

How workload changes

Topic	Teradata	Snowflake
Incremental correctness	Often encoded in utilities, control tables, and conventions	Explicit windows + deterministic staging + integrity gates
MERGE semantics	Teradata casting/NULL semantics may differ subtly	Snowflake MERGE with explicit casts and predicates
Cost predictability	Platform capacity and spool behavior	Warehouse credits + bounded scans
Orchestration	Schedulers + script chains	Airflow/dbt with explicit DAG contracts

Incremental correctness: Correctness becomes auditable and repeatable under retries/backfills.

MERGE semantics: Validation focuses on match keys and update predicates.

Cost predictability: Bounded applies + isolation keep credit burn stable.

Orchestration: Retries, alerts, and dependencies become measurable.

Examples

Canonical Snowflake incremental apply pattern for Teradata-style pipelines: stage → dedupe deterministically → bounded MERGE + applied-batch tracking. Adjust keys, offsets, and casts to your model.

-- Applied-batch tracking (restartability)
CREATE TABLE IF NOT EXISTS CONTROL.APPLIED_BATCHES (
  job_name STRING NOT NULL,
  batch_id STRING NOT NULL,
  applied_at TIMESTAMP_NTZ NOT NULL,
  PRIMARY KEY (job_name, batch_id)
);

Avoid

Common pitfalls

Implicit casts: Teradata type coercion differs; Snowflake needs explicit casts for stable outcomes.
NULL semantics drift: match predicates and join keys behave differently unless explicit.
Non-deterministic dedupe: missing tie-breakers causes drift under retries.
Full-target MERGE: missing apply boundaries causes large scans and credit spikes.
Delete semantics lost: deletes/tombstones not modeled; downstream facts diverge.
Schema evolution surprises: upstream fields widen/change; typed targets break without drift policy.
No warehouse isolation: BI and batch share warehouses; tail latency and spend spikes follow.

Proof

Validation approach

Execution checks: pipelines run reliably under representative volumes and schedules.
Structural parity: window-level row counts and column profiles (null/min/max/distinct) for key tables.
KPI parity: aggregates by key dimensions for critical marts and dashboards.
Incremental integrity (mandatory):
- Idempotency: rerun same window/batch → no net change
- Restart simulation: fail mid-run → resume → correct final state
- Backfill: historical windows replay without drift
- Late-arrival: inject late corrections → only expected rows change
- Dedupe stability: duplicates eliminated consistently under retries
Cost/performance gates: bounded MERGE scope verified; credit/runtime thresholds set for top jobs.
Operational readiness: retry/alerting tests, canary gates, and rollback criteria defined before cutover.

Execution

Migration steps

A sequence that keeps correctness measurable and cutover controlled.

01
Inventory ETL jobs, utilities, and dependencies
Extract job chains, upstream/downstream dependencies, SLAs, retry policies, and control-table conventions. Identify business-critical marts and consumers.
02
Formalize the run contract
Define load windows/high-water marks, business keys, deterministic ordering/tie-breakers, dedupe rules, late-arrival policy, delete semantics, and restartability. Make idempotency explicit.
03
Rebuild staging and apply on Snowflake
Implement landing → typed staging → dedupe → bounded MERGE apply, plus drift policy (widen/quarantine/reject) and explicit DQ checks where needed.
04
Re-home orchestration and warehouse posture
Implement DAGs with retries/alerts and isolate batch warehouses from BI. Add applied-batch tracking and failure handling.
05
Run parity and integrity gates
Golden outputs + KPI aggregates, idempotency reruns, restart simulations, late-data injections, and backfill windows. Cut over only when thresholds pass and rollback criteria are defined.

Workload Assessment

Migrate Teradata ETL with restartability intact

We inventory your Teradata ETL estate, formalize the run contract (windows, restartability, SCD), migrate a representative pipeline end-to-end, and deliver parity evidence with cutover gates.

Book assessment See migration approach

Migration Acceleration

Cut over pipelines with proof-backed gates

Get an actionable migration plan with restart simulations, incremental integrity tests, reconciliation evidence, and credit baselines—so ETL cutover is controlled and dispute-proof.

Book assessment Explore workloads

FAQ

Frequently asked questions

Is Teradata ETL migration just rewriting scripts?+

No. The critical work is preserving the run contract: windows/watermarks, restartability, dedupe/tie-break rules, and MERGE/upsert semantics under retries. Script translation is only one part.

How do you preserve Teradata MERGE behavior in Snowflake?+

We formalize match keys, explicit casts, and update predicates, then implement staged apply with bounded MERGE scope. Integrity simulations prove idempotency and late-arrival behavior.

How do you avoid Snowflake credit spikes for ETL?+

We bound MERGE scope to affected windows, design pruning-aware staging, and isolate batch warehouses from BI. Validation includes credit/runtime baselines and regression thresholds for top jobs.

What about late-arriving data and backfills?+

We implement explicit late-arrival windows and backfill replay semantics, then validate with simulations so only expected rows change and KPIs remain stable.

Teradata to Snowflake migration

Migration Pair

End-to-end approach: what breaks, validation gates, and cutover plan.

View page

Validation & reconciliation

Method

How we define parity contracts, thresholds, and evidence for sign-off.

Learn more

Cost & performance governance

Method

Keep credit burn predictable and catch regressions early.

Learn more