Redshift SQL queries to BigQuery
Translate Redshift SQL—analytic/window patterns, date/time logic, and legacy DIST/SORT-era assumptions—into BigQuery Standard SQL with validation gates that prevent semantic drift and cost surprises.
- Input
- Redshift SQL / query migration logic
- Output
- BigQuery equivalent (validated)
- Common pitfalls
- Implicit casts: Redshift queries that “just work” can drift in BigQuery unless types are made explicit.
- NULL semantics in joins: join keys and filters behave differently when NULLs are involved unless null-safe equality is used deliberately.
- Window ordering ambiguity: ROW_NUMBER/RANK without stable tie-breakers causes nondeterministic drift.
Why this breaks
Redshift SQL estates often encode more than syntax: performance assumptions shaped by DISTKEY/SORTKEY, vacuum/analyze habits, and UPSERT patterns implemented via staging + delete/insert. BigQuery will compile many translated queries—but outputs and costs can drift when implicit casts, NULL semantics, and time handling aren’t made explicit.
Common symptoms after cutover:
- KPI drift from window logic and top-N selections with incomplete ordering
- Type coercion changes in CASE/COALESCE and join keys, producing silent mismatches
- Date/time edge cases shift (DATE vs TIMESTAMP intent, timezone boundaries)
- “UPSERT queries” translate syntactically but change behavior under retries
- Performance regressions because DIST/SORT-era query shapes don’t prune well in BigQuery
SQL migration must preserve both meaning and a BigQuery-native execution posture.
How conversion works
- Inventory & prioritize the SQL corpus (BI extracts, views, dbt/ELT models, app-embedded SQL). Rank by business impact, frequency, and complexity.
- Normalize Redshift dialect: quoting/identifiers, date literals, function idioms, and CTE normalization.
- Rewrite with rule-anchored mappings: function equivalents, window/analytic rewrites, explicit cast strategy, and deterministic ordering for top-N/windowed filters.
- Pattern libraries for high-risk constructs: NULL-safe comparisons, time semantics, UPSERT/delete-insert patterns, and string/regex behaviors.
- Validate with gates: compile/run checks, catalog/type alignment, and golden-query parity with edge-window sampling diffs.
- Performance-safe refactors: pruning-aware filters, join strategy adjustments, and pre-aggregation/materialization guidance for expensive BI queries.
Supported constructs
Representative Redshift SQL constructs we commonly convert to BigQuery Standard SQL (exact coverage depends on your estate).
| Source | Target | Notes |
|---|---|---|
| Window functions + top-N filters | Window functions + QUALIFY/LIMIT patterns | Deterministic ORDER BY and tie-breakers enforced. |
| DELETE+INSERT UPSERT queries | MERGE with bounded apply windows | Idempotency and match-key semantics made explicit. |
| Date/time arithmetic | DATE_ADD/DATE_DIFF/TIMESTAMP_* functions | DATE vs TIMESTAMP intent normalized explicitly. |
| NULL/type coercion idioms | Explicit casts + null-safe comparisons | Prevents join drift and filter selectivity changes. |
| String/regex functions | BigQuery string/regex equivalents | Edge-case behavior validated via golden cohorts. |
| DISTKEY/SORTKEY-driven query shapes | Pruning-aware rewrites + partitioning/clustering guidance | Replace physical tuning assumptions with BigQuery-native layout and pruning. |
How workload changes
| Topic | Redshift | BigQuery |
|---|---|---|
| Performance model | DISTKEY/SORTKEY + VACUUM/ANALYZE habits | Partition pruning + clustering + slot usage |
| Upserts | DELETE+INSERT patterns common | MERGE with bounded apply windows |
| Type behavior | Implicit casts often tolerated | Explicit casts recommended for stable outputs |
Examples
Representative Redshift → BigQuery rewrites for top-N, date arithmetic, and upsert patterns. Adjust identifiers and types to your schema.
-- Redshift: TOP-N with ORDER BY
SELECT
product_id,
SUM(revenue) AS rev
FROM sales
WHERE sale_dt BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY 1
ORDER BY rev DESC
LIMIT 100;Common pitfalls
- Implicit casts: Redshift queries that “just work” can drift in BigQuery unless types are made explicit.
- NULL semantics in joins: join keys and filters behave differently when NULLs are involved unless null-safe equality is used deliberately.
- Window ordering ambiguity: ROW_NUMBER/RANK without stable tie-breakers causes nondeterministic drift.
- DIST/SORT assumptions carried over: physical tuning treated as schema; BigQuery needs partitioning/clustering and pruning-aware SQL.
- UPSERT via DELETE+INSERT: naive rewrites break idempotency under retries and partial failures.
- Unbounded scans: missing partition filters cause bytes-scanned blowups and cost spikes.
Validation approach
- Compilation gates: converted queries compile under BigQuery Standard SQL.
- Catalog/type checks: referenced tables/columns exist; implicit casts are surfaced and made explicit.
- Golden-query parity: critical dashboards and queries match on outputs or agreed tolerances.
- KPI aggregates: compare aggregates by key dimensions (date, region, product, cohorts).
- Edge-cohort diffs: validate ties, null-heavy segments, boundary dates, and timezone transitions.
- Performance baseline: capture runtime/bytes scanned/slot time for top queries; set regression thresholds.
Migration steps
- 01
Collect and prioritize the query estate
Export BI SQL, view definitions, dbt models, and app-embedded queries. Rank by business impact, frequency, and risk patterns (UPSERTs, windows, time, implicit casts).
- 02
Define semantic contracts for risky patterns
Make tie-breakers, NULL handling, casting strategy, and date/time intent explicit—especially for top-N/window logic and UPSERT behavior.
- 03
Convert with rule-anchored mappings
Apply deterministic rewrites for common constructs and flag ambiguous intent with review markers (implicit casts, ordering ambiguity, regex/collation expectations).
- 04
Validate with golden queries and edge cohorts
Compile and run in BigQuery, compare KPI aggregates, and run targeted diffs on edge cohorts (ties, null-heavy segments, boundary dates).
- 05
Optimize top queries for BigQuery
Add pruning-aware filters, align partitioning/clustering to access paths, and recommend materializations for expensive BI workloads.
We inventory your SQL estate, convert a representative slice, and deliver parity evidence on golden queries—plus a risk register for constructs that carry business meaning.
Get a conversion plan, review markers, and validation artifacts so query cutover is gated by evidence and rollback-ready criteria.