Redshift stored procedures & UDFs to BigQuery
Re-home procedural logic—stored procedures, UDFs, and operational utilities—into BigQuery routines with an explicit behavior contract and validation harness so retries and backfills don’t change outcomes.
- Input
- Redshift Stored procedure / UDF migration logic
- Output
- BigQuery equivalent (validated)
- Common pitfalls
- Hidden state coupling: reliance on temp tables, session variables, or search_path-like behavior.
- Mixed-type branches: CASE/IF returns mixed types; BigQuery requires explicit casts to preserve intent.
- NULL semantics drift: comparisons and string functions behave differently unless made explicit.
Why this breaks
In Redshift estates, procedural logic often carries the run contract: control tables, watermark advancement, dynamic SQL for partition/window refresh, and UDFs that embed business rules and typing assumptions. When migrated naïvely, routines may compile in BigQuery but drift in behavior—especially under retries, partial failures, and backfills.
Common symptoms after migration:
- UDF outputs drift due to type coercion and NULL handling differences
- Dynamic SQL changes behavior (quoting, identifier resolution, parameter binding)
- Error handling semantics differ; pipelines fail differently or silently continue
- Side effects (audit/control tables) aren’t recreated, breaking restartability
- Row-by-row procedural patterns become expensive and fragile in BigQuery
A successful migration turns implicit behavior into a testable contract and validates it with a replayable harness.
How conversion works
- Inventory & classify procedures and UDFs, plus call sites across ETL and BI. Map dependencies (tables, views, control/audit tables).
- Extract the behavior contract: inputs/outputs, side effects, error semantics, state assumptions, and performance constraints.
- Choose target form per routine:
- BigQuery SQL UDF for pure expressions
- BigQuery JavaScript UDF for complex logic where JS is appropriate
- BigQuery stored procedure (SQL scripting) for multi-statement control flow
- Set-based refactor where procedural loops can be eliminated
- Rewrite dynamic SQL safely using parameter binding and explicit identifier rules.
- Validate with a harness: golden inputs/outputs, branch coverage, failure-mode tests, and side-effect assertions (control/audit writes).
Supported constructs
Representative Redshift procedural constructs we commonly migrate to BigQuery routines (exact coverage depends on your estate).
| Source | Target | Notes |
|---|---|---|
| Redshift UDFs (SQL) | BigQuery SQL UDFs | Pure expressions mapped with explicit casts and NULL behavior. |
| Redshift stored procedures | BigQuery stored procedures (SQL scripting) | Control flow rewritten; state and side effects modeled explicitly. |
| Dynamic SQL patterns | EXECUTE IMMEDIATE with parameter binding | Normalize identifier rules; reduce drift and injection risk. |
| Control tables for restartability | Applied-window tracking + idempotency markers | Reruns are safe and auditable under failures/backfills. |
| Utility macros in ETL scripts | Reusable views/UDFs/procedures | Consolidate reusable patterns into testable BigQuery assets. |
| Row-by-row transforms | Set-based SQL refactors | Avoids cost and reliability cliffs in BigQuery. |
How workload changes
| Topic | Redshift | BigQuery |
|---|---|---|
| Execution model | Procedures often rely on session context and ETL conventions | Routines should be explicit about state, inputs, and side effects |
| Dynamic SQL | Concatenated SQL strings common | Prefer parameter binding and explicit identifier rules |
| Performance | Row-by-row loops sometimes tolerated | Set-based refactors usually required for cost/latency |
| Restartability | Control tables and script chains encode rerun behavior | Idempotency markers + applied-window tracking enforced |
Examples
Illustrative patterns for moving Redshift UDF/procedure logic into BigQuery routines. Adjust datasets, types, and identifiers to match your environment.
-- BigQuery SQL UDF example
CREATE OR REPLACE FUNCTION `proj.util.safe_div`(n NUMERIC, d NUMERIC) AS (
IF(d IS NULL OR d = 0, NULL, n / d)
);Common pitfalls
- Hidden state coupling: reliance on temp tables, session variables, or search_path-like behavior.
- Mixed-type branches: CASE/IF returns mixed types; BigQuery requires explicit casts to preserve intent.
- NULL semantics drift: comparisons and string functions behave differently unless made explicit.
- Dynamic SQL injection risk: string concatenation without bindings/escaping causes drift and security risk.
- Side effects ignored: control-table and audit updates not modeled; retries/backfills double-apply or skip.
- Row-by-row loops: procedural loops that should be rewritten as set-based SQL for cost and reliability.
- No harness: without replayable tests, parity becomes a debate at cutover.
Validation approach
- Compile + interface checks: each routine deploys and signatures match the contract (args/return types).
- Golden tests: curated input sets validate outputs, including NULL-heavy and boundary cases.
- Branch + failure-mode coverage: expected failures (invalid inputs, missing rows) are tested.
- Side-effect verification: assert expected writes to control/log/audit tables and idempotency under retries.
- Integration replay: run routines within representative pipelines and compare downstream KPIs/aggregates.
- Performance gate: confirm no hidden row-by-row scans; set-based refactors validated with scan bytes/runtime baselines.
Migration steps
- 01
Inventory routines and build the call graph
Collect Redshift procedures and UDFs, their callers, and dependent objects. Identify side effects: control/audit tables, temp objects, and dynamic SQL paths.
- 02
Define the behavior contract
For each routine, specify inputs/outputs, expected errors, side effects, restart semantics, and performance expectations. Decide the target form (SQL UDF, JS UDF, procedure, or set-based refactor).
- 03
Convert logic with safety patterns
Rewrite casts and NULL behavior explicitly, migrate dynamic SQL using parameter binding, and refactor row-by-row patterns into set-based SQL where feasible.
- 04
Build a validation harness
Create golden inputs, boundary cases, and failure-mode tests. Validate outputs and side effects deterministically so parity isn’t debated during cutover.
- 05
Integrate, replay, and cut over
Run routines within representative pipelines, compare downstream KPIs, validate reruns/backfills, and cut over behind gates with rollback-ready criteria.
We inventory your Redshift procedures and UDFs, migrate a representative subset into BigQuery routines, and deliver a harness that proves parity—including side effects and rerun behavior.
Get an actionable conversion plan, review markers for ambiguous intent, and validation artifacts so procedural logic cutover is gated by evidence and rollback criteria.