Migrating SSIS to Snowflake: DTSX Packages to Snowpark and Task Pipelines

SQL Server Integration Services (SSIS) has been the cornerstone of Microsoft-centric ETL for over two decades. Organizations running hundreds or thousands of DTSX packages face a familiar set of challenges: aging Windows servers hosting the SSIS Catalog, complex package configurations spread across XML files and environment variables, and a deployment model that demands dedicated infrastructure. As data warehousing shifts to cloud-native platforms, migrating SSIS packages to Snowflake represents both a modernization opportunity and a significant architectural transformation.

This article provides a detailed technical mapping of every major SSIS concept to its Snowflake-native equivalent. We cover Control Flow tasks, Data Flow transformations, connection managers, variables, logging, and the orchestration model — with code examples showing the before and after for each pattern. Whether you are running SSIS on-premises or through Azure Data Factory's SSIS Integration Runtime, the target architecture remains the same: Snowflake SQL, Snowpark Python, Snowflake Tasks, and Streams.

SSIS Architecture vs. Snowflake Architecture

SSIS packages (.dtsx files) are XML documents that define two primary layers: the Control Flow (orchestration) and the Data Flow (transformation). The Control Flow sequences tasks such as Execute SQL, Script Task, File System Task, and Data Flow Task. The Data Flow defines source-to-destination pipelines with inline transformations. Packages execute on the SSIS runtime, which manages buffers, threading, and logging through the Integration Services Catalog or legacy package deployment.

Snowflake replaces this entire stack with a fundamentally different model. There is no dedicated ETL server. Compute is provided by virtual warehouses that auto-suspend and auto-resume. Transformation logic runs as SQL statements, stored procedures, or Snowpark Python/Scala/Java programs — all executing on Snowflake's distributed compute engine. Orchestration is handled by Snowflake Tasks (CRON-based DAGs) and Streams (change data capture). Data ingestion uses Stages, Snowpipe, and COPY INTO rather than OLE DB or ADO.NET providers.

SSIS Concept	Snowflake Equivalent	Notes
DTSX Package	Stored Procedure / Snowpark Script	A single unit of ETL logic
Control Flow	Task DAG (Tasks with predecessors)	CRON-scheduled dependency chains
Data Flow Task	Snowpark Python pipeline / SQL	Transformation logic on Snowflake compute
Execute SQL Task	SQL statement / Stored Procedure	Direct SQL execution on warehouse
For Each Loop Container	Snowflake Scripting LOOP / CURSOR	Procedural iteration in SQL
Script Task (C#)	Snowpark UDF / Stored Procedure	Python, Java, or Scala logic
SSIS Variables	Session variables / Procedure params	SET variable = value
Connection Manager	Stage / Storage Integration	External stages for cloud storage
Derived Column	SQL expression / Snowpark withColumn	Inline computed columns
Conditional Split	CASE expression / FILTER	Row routing via SQL predicates
Lookup Transform	JOIN	No caching needed — all data in Snowflake
OLE DB Source/Destination	Snowflake tables / Stages	Native storage, no middleware
Flat File Source	Stage + File Format + COPY INTO	CSV, JSON, Parquet ingestion
SSIS Catalog	Snowflake schema + Task metadata	No separate catalog server required
Package Parameters	Procedure arguments / session vars	Parameterized execution
Precedence Constraints	Task predecessors (AFTER clause)	DAG-based dependency model
Event Handlers	Task error handling / SYSTEM$SEND_EMAIL	Alerting and error notification
Logging Providers	ACCOUNT_USAGE / TASK_HISTORY views	Built-in execution telemetry

SSIS to Snowflake migration — automated end-to-end by MigryX

Control Flow Migration: From Task Sequences to Snowflake Task DAGs

SSIS Control Flow defines the execution order of tasks within a package. Tasks are connected by precedence constraints that specify success, failure, or expression-based conditions. Containers (Sequence, For Each Loop, For Loop) group tasks and provide scoping for variables and transactions.

In Snowflake, the Task object replaces the entire Control Flow layer. Tasks can reference predecessor tasks using the AFTER clause, forming a directed acyclic graph (DAG). Each task runs a single SQL statement or calls a stored procedure. Conditional execution is achieved through the WHEN clause, which evaluates a boolean expression before the task body runs.

Execute SQL Task to Snowflake Stored Procedures

The Execute SQL Task is the most common SSIS Control Flow task. It runs a SQL statement against a connection manager and optionally captures results into SSIS variables. In Snowflake, this becomes a direct SQL statement within a stored procedure or a standalone Task.

-- SSIS Execute SQL Task (conceptual):
-- Connection: OLE DB to SQL Server
-- SQL: EXEC dbo.usp_load_daily_sales @run_date = ?
-- Parameter mapping: User::RunDate -> Parameter 0

-- Snowflake equivalent: Stored Procedure with parameters
CREATE OR REPLACE PROCEDURE etl.load_daily_sales(run_date DATE)
RETURNS STRING
LANGUAGE SQL
EXECUTE AS CALLER
AS
BEGIN
    -- Truncate staging area
    TRUNCATE TABLE staging.daily_sales;

    -- Load from raw to staging with date filter
    INSERT INTO staging.daily_sales (
        sale_id, product_id, customer_id, quantity,
        unit_price, sale_date, region
    )
    SELECT
        sale_id, product_id, customer_id, quantity,
        unit_price, sale_date, region
    FROM raw.pos_transactions
    WHERE sale_date = :run_date;

    -- Merge into target fact table
    MERGE INTO warehouse.fact_daily_sales t
    USING staging.daily_sales s
    ON t.sale_id = s.sale_id
    WHEN MATCHED THEN UPDATE SET
        t.quantity = s.quantity,
        t.unit_price = s.unit_price,
        t.updated_at = CURRENT_TIMESTAMP()
    WHEN NOT MATCHED THEN INSERT (
        sale_id, product_id, customer_id, quantity,
        unit_price, sale_date, region, created_at
    ) VALUES (
        s.sale_id, s.product_id, s.customer_id, s.quantity,
        s.unit_price, s.sale_date, s.region, CURRENT_TIMESTAMP()
    );

    RETURN 'Sales loaded for ' || :run_date::STRING;
END;

For Each Loop Container to Snowflake Scripting

SSIS For Each Loop Containers iterate over collections such as file enumerators, ADO recordsets, or variable lists. Each iteration assigns values to SSIS variables and executes the contained tasks. In Snowflake, this pattern is implemented using Snowflake Scripting (SQL stored procedures with LOOP, FOR, and CURSOR constructs).

-- SSIS For Each Loop: Iterate over files in a directory
-- Enumerator: Foreach File Enumerator
-- Folder: \\server\share\incoming\
-- File Pattern: *.csv
-- Variable mapping: User::CurrentFile

-- Snowflake equivalent: Iterate over staged files
CREATE OR REPLACE PROCEDURE etl.load_all_staged_files()
RETURNS STRING
LANGUAGE SQL
EXECUTE AS CALLER
AS
DECLARE
    file_count INTEGER DEFAULT 0;
    file_name STRING;
    file_cursor CURSOR FOR
        SELECT "name" FROM TABLE(RESULT_SCAN(
            LAST_QUERY_ID()
        ));
BEGIN
    -- List files in stage (equivalent to directory enumeration)
    LIST @raw_data_stage/incoming/ PATTERN='.*[.]csv';

    OPEN file_cursor;
    LOOP
        FETCH file_cursor INTO file_name;
        IF (NOT FOUND) THEN
            LEAVE;
        END IF;

        -- Load each file individually
        COPY INTO staging.incoming_data
        FROM @raw_data_stage/incoming/
        FILES = (:file_name)
        FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1)
        ON_ERROR = 'CONTINUE';

        file_count := file_count + 1;
    END LOOP;
    CLOSE file_cursor;

    RETURN 'Loaded ' || file_count::STRING || ' files';
END;

Script Task to Snowpark Python UDFs and Procedures

SSIS Script Tasks execute C# or VB.NET code for custom logic that cannot be expressed in SQL — string parsing, API calls, file manipulation, custom validation. In Snowflake, Snowpark Python stored procedures and UDFs provide equivalent extensibility, running directly on Snowflake's compute engine without any external infrastructure.

# SSIS Script Task equivalent: Complex data validation with Python
# This Snowpark stored procedure replaces a C# Script Task that
# validates records, applies business rules, and logs exceptions

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, when, lit, current_timestamp
from snowflake.snowpark.types import StringType

def main(session: snowpark.Session) -> str:
    # Read staging data (equivalent to SSIS ReadOnlyVariables)
    df = session.table("staging.customer_updates")

    # Apply business rules (equivalent to C# validation logic)
    validated = df.with_column(
        "email_valid",
        when(col("email").rlike(r"^[^@]+@[^@]+\.[^@]+$"), lit(True))
        .otherwise(lit(False))
    ).with_column(
        "phone_clean",
        col("phone").regexp_replace(r"[^0-9]", "")
    ).with_column(
        "customer_tier",
        when(col("lifetime_value") >= 100000, lit("PLATINUM"))
        .when(col("lifetime_value") >= 50000, lit("GOLD"))
        .when(col("lifetime_value") >= 10000, lit("SILVER"))
        .otherwise(lit("BRONZE"))
    )

    # Route valid records to target (equivalent to Conditional Split)
    valid_records = validated.filter(col("email_valid") == True)
    invalid_records = validated.filter(col("email_valid") == False)

    # Write valid records
    valid_records.write.mode("overwrite").save_as_table(
        "warehouse.dim_customers_staging"
    )

    # Log invalid records for review
    invalid_records.select(
        col("customer_id"), col("email"), col("phone"),
        lit("INVALID_EMAIL").alias("error_type"),
        current_timestamp().alias("logged_at")
    ).write.mode("append").save_as_table("audit.validation_errors")

    valid_count = valid_records.count()
    invalid_count = invalid_records.count()

    return f"Validated: {valid_count} valid, {invalid_count} invalid"

MigryX: Purpose-Built Parsers for Every Legacy Technology

MigryX does not rely on generic text matching or regex-based parsing. For every supported legacy technology, MigryX has built a dedicated Abstract Syntax Tree (AST) parser that understands the full grammar and semantics of that platform. This means MigryX captures not just what the code does, but why — understanding implicit behaviors, default settings, and platform-specific quirks that generic tools miss entirely.

Data Flow Migration: From Buffer Pipelines to Snowpark and SQL

The SSIS Data Flow is the heart of the transformation engine. It defines a pipeline of sources, transformations, and destinations connected by paths. The runtime manages memory buffers, threading, and data type conversions. Each transformation operates on in-memory buffers row-by-row or in batches.

In Snowflake, the Data Flow concept dissolves entirely. There are no in-memory buffers or transformation pipelines. Instead, transformations are expressed as SQL queries or Snowpark DataFrame operations that execute on Snowflake's distributed compute. This is fundamentally more efficient because data does not leave Snowflake — there is no serialization, network transfer, or deserialization through an external ETL engine.

Complete SSIS Data Flow Pipeline to Snowpark Python

Consider a typical SSIS Data Flow that reads from an OLE DB source, applies Derived Column, Conditional Split, Lookup, and Aggregate transformations, then writes to an OLE DB destination. Below is the complete Snowpark Python equivalent demonstrating how a multi-component Data Flow collapses into a single cohesive script.

# Complete SSIS Data Flow replacement in Snowpark Python
#
# Original SSIS Data Flow:
#   OLE DB Source (orders) -->
#   Derived Column (calculate total, fiscal_quarter) -->
#   Lookup (product catalog) -->
#   Conditional Split (domestic vs international) -->
#   Aggregate (by region, quarter) -->
#   OLE DB Destination (summary table)

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import (
    col, sum as _sum, count, avg, max as _max,
    when, lit, quarter, year, current_timestamp
)

def main(session: snowpark.Session) -> str:
    # === OLE DB Source equivalent ===
    orders = session.table("bronze.sales_orders")

    # === Derived Column equivalent ===
    enriched = orders.with_column(
        "line_total", col("quantity") * col("unit_price")
    ).with_column(
        "fiscal_quarter",
        when(quarter(col("order_date")).isin([1, 2, 3]),
             year(col("order_date")))
        .otherwise(year(col("order_date")) + 1)
    ).with_column(
        "fiscal_q_label",
        when(quarter(col("order_date")) == 1, lit("Q1"))
        .when(quarter(col("order_date")) == 2, lit("Q2"))
        .when(quarter(col("order_date")) == 3, lit("Q3"))
        .otherwise(lit("Q4"))
    )

    # === Lookup equivalent (product catalog) ===
    products = session.table("ref.product_catalog")
    with_product = enriched.join(
        products,
        enriched["product_id"] == products["product_id"],
        "left"
    ).select(
        enriched["*"],
        products["product_name"],
        products["category"],
        products["product_tier"]
    )

    # === Conditional Split equivalent ===
    domestic = with_product.filter(col("ship_country") == "US")
    international = with_product.filter(col("ship_country") != "US")

    # === Aggregate equivalent (by region and quarter) ===
    domestic_summary = domestic.group_by(
        col("region"), col("fiscal_quarter"), col("fiscal_q_label")
    ).agg(
        _sum(col("line_total")).alias("total_revenue"),
        count(col("order_id")).alias("order_count"),
        avg(col("line_total")).alias("avg_order_value"),
        _max(col("order_date")).alias("latest_order")
    )

    international_summary = international.group_by(
        col("ship_country"), col("fiscal_quarter"), col("fiscal_q_label")
    ).agg(
        _sum(col("line_total")).alias("total_revenue"),
        count(col("order_id")).alias("order_count"),
        avg(col("line_total")).alias("avg_order_value"),
        _max(col("order_date")).alias("latest_order")
    )

    # === OLE DB Destination equivalent ===
    domestic_summary.write.mode("overwrite").save_as_table(
        "gold.domestic_sales_summary"
    )
    international_summary.write.mode("overwrite").save_as_table(
        "gold.international_sales_summary"
    )

    return "Data flow pipeline completed successfully"

Derived Column Transformation to SQL Expressions

SSIS Derived Column transformations add or replace columns using SSIS expression syntax. In Snowflake, these become SELECT expressions in SQL or with_column() calls in Snowpark. The SQL expression language is far more powerful than the SSIS expression language, supporting window functions, regular expressions, and semi-structured data navigation that would require Script Tasks in SSIS.

-- SSIS Derived Column:
--   full_name = first_name + " " + last_name
--   age = DATEDIFF("yy", birth_date, GETDATE())
--   status_flag = order_total > 1000 ? "HIGH" : "NORMAL"

-- Snowflake SQL equivalent
SELECT
    *,
    first_name || ' ' || last_name AS full_name,
    DATEDIFF('year', birth_date, CURRENT_DATE()) AS age,
    CASE WHEN order_total > 1000 THEN 'HIGH' ELSE 'NORMAL' END AS status_flag
FROM staging.customer_orders;

Conditional Split to CASE Expressions

SSIS Conditional Split routes rows to different outputs based on boolean expressions. Each output path connects to a separate transformation chain or destination. In Snowflake, this is expressed as CASE expressions for tagging within a single result set, or separate INSERT statements with WHERE filters when routing to different physical tables.

-- SSIS Conditional Split:
--   Output 1 (Priority): priority_level = "CRITICAL" AND sla_hours < 4
--   Output 2 (Standard): priority_level IN ("HIGH", "MEDIUM")
--   Default Output: All remaining rows

-- Snowflake SQL: Route to different tables using INSERT...SELECT
INSERT INTO ops.critical_tickets
SELECT * FROM staging.support_tickets
WHERE priority_level = 'CRITICAL' AND sla_hours < 4;

INSERT INTO ops.standard_tickets
SELECT * FROM staging.support_tickets
WHERE priority_level IN ('HIGH', 'MEDIUM')
  AND NOT (priority_level = 'CRITICAL' AND sla_hours < 4);

INSERT INTO ops.low_priority_tickets
SELECT * FROM staging.support_tickets
WHERE priority_level NOT IN ('CRITICAL', 'HIGH', 'MEDIUM')
   OR priority_level IS NULL;

Lookup Transformation to SQL JOIN

SSIS Lookup transformations perform reference data enrichment with configurable caching modes (Full Cache, Partial Cache, No Cache). In Snowflake, all reference data resides in the same platform, so lookups become standard JOINs with no caching configuration required. The query optimizer handles data access patterns automatically.

-- SSIS Lookup: Enrich orders with customer dimension
-- Cache Mode: Full Cache
-- Connection: OLE DB to SQL Server warehouse
-- Lookup Table: dim_customer
-- Join: customer_id = customer_id
-- Output: customer_name, customer_segment, account_manager

-- Snowflake SQL equivalent (no caching config needed)
CREATE OR REPLACE TABLE silver.enriched_orders AS
SELECT
    o.order_id,
    o.order_date,
    o.product_id,
    o.quantity,
    o.unit_price,
    o.quantity * o.unit_price AS line_total,
    c.customer_name,
    c.customer_segment,
    c.account_manager
FROM bronze.orders o
LEFT JOIN ref.dim_customer c
    ON o.customer_id = c.customer_id;

In SSIS, choosing the wrong Lookup cache mode can cause severe performance degradation — Full Cache loads the entire reference table into memory, while No Cache issues a query per row. In Snowflake, the query optimizer handles data access patterns automatically. There is no cache mode to configure, no memory allocation to tune, and no performance cliff when reference tables grow beyond available memory on the SSIS server.

Connection Managers to Stages and Storage Integrations

SSIS Connection Managers define connectivity to sources and destinations: OLE DB for databases, Flat File for CSV/text, ADO.NET for .NET providers, and SMTP for email notifications. Each connection manager stores credentials and connection strings, often parameterized through package configurations or SSIS Catalog environments.

In Snowflake, external data connectivity is handled through Stages (referencing cloud storage locations), Storage Integrations (managing authentication to cloud providers), and File Formats (defining how to parse files). Database-to-database connectivity is unnecessary because all data lives within the Snowflake platform.

-- SSIS Flat File Connection Manager equivalent:
-- Source: \\server\share\data\daily_export.csv
-- Column delimiter: comma, text qualifier: double-quote
-- Header row: yes

-- Snowflake: Create File Format + External Stage
CREATE OR REPLACE FILE FORMAT etl.csv_standard
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    SKIP_HEADER = 1
    NULL_IF = ('', 'NULL', 'null')
    EMPTY_FIELD_AS_NULL = TRUE
    ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE;

-- Create Storage Integration for secure cloud access
CREATE OR REPLACE STORAGE INTEGRATION s3_data_integration
    TYPE = EXTERNAL_STAGE
    STORAGE_PROVIDER = 'S3'
    ENABLED = TRUE
    STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789:role/snowflake-access'
    STORAGE_ALLOWED_LOCATIONS = ('s3://company-data/');

-- Create External Stage
CREATE OR REPLACE STAGE etl.data_stage
    URL = 's3://company-data/daily-exports/'
    STORAGE_INTEGRATION = s3_data_integration
    FILE_FORMAT = etl.csv_standard;

-- Load data (replaces entire SSIS package for file ingestion)
COPY INTO bronze.daily_export
FROM @etl.data_stage
PATTERN = '.*daily_export.*[.]csv'
ON_ERROR = 'CONTINUE';

-- Or use Snowpipe for continuous ingestion
CREATE OR REPLACE PIPE bronze.daily_export_pipe
  AUTO_INGEST = TRUE
AS
  COPY INTO bronze.daily_export
  FROM @etl.data_stage
  FILE_FORMAT = etl.csv_standard;

SSIS Variables to Snowflake Session Variables and Procedure Parameters

SSIS packages use variables extensively for parameterization, loop counters, result set capture, and inter-task communication. Variables have data types, scoping rules (package, container, task), and can be read-only or read-write. Package parameters provide external input at execution time.

In Snowflake, session variables (SET/UNSET), stored procedure parameters, and the IDENTIFIER() function provide equivalent functionality. Variables can be referenced in SQL using the $variable syntax or the :variable bind syntax inside stored procedures.

-- SSIS Package Parameters and Variables:
-- Parameter: RunDate (Date), Environment (String)
-- Variable: RowCount (Int32), ProcessStatus (String)

-- Snowflake: Session variables for ad-hoc execution
SET run_date = '2026-04-08'::DATE;
SET environment = 'PRODUCTION';
SET target_schema = 'warehouse';

-- Use variables in queries
CREATE OR REPLACE TABLE IDENTIFIER($target_schema || '.fact_sales') AS
SELECT *
FROM staging.validated_sales
WHERE sale_date = $run_date;

-- Snowflake: Stored procedure for parameterized execution
CREATE OR REPLACE PROCEDURE etl.run_daily_pipeline(
    run_date DATE,
    environment STRING DEFAULT 'PRODUCTION',
    full_refresh BOOLEAN DEFAULT FALSE
)
RETURNS VARIANT
LANGUAGE SQL
EXECUTE AS CALLER
AS
DECLARE
    row_count INTEGER;
    process_status STRING DEFAULT 'STARTED';
    result VARIANT;
BEGIN
    -- Equivalent to SSIS variable assignments during execution
    IF (:full_refresh) THEN
        TRUNCATE TABLE warehouse.fact_sales;
        process_status := 'FULL_REFRESH';
    END IF;

    -- Core load logic
    INSERT INTO warehouse.fact_sales
    SELECT * FROM staging.validated_sales
    WHERE sale_date = :run_date;

    row_count := SQLROWCOUNT;
    process_status := 'COMPLETED';

    -- Return result (equivalent to SSIS variable output)
    result := OBJECT_CONSTRUCT(
        'run_date', :run_date,
        'environment', :environment,
        'rows_loaded', row_count,
        'status', process_status,
        'completed_at', CURRENT_TIMESTAMP()
    );

    -- Log execution (equivalent to SSIS logging provider)
    INSERT INTO audit.pipeline_log (pipeline_name, run_date, result)
    VALUES ('daily_sales', :run_date, :result);

    RETURN result;
END;

Orchestrating the Full Pipeline: SSIS Package Execution to Snowflake Task DAGs

In SSIS, packages are deployed to the SSIS Catalog and executed via SQL Server Agent jobs. Jobs can chain multiple packages with step-level success/failure logic. Master packages call child packages using the Execute Package Task. This creates a hierarchical execution model that is difficult to maintain and monitor across hundreds of packages.

Snowflake Tasks replace this entire orchestration layer. A root task runs on a CRON schedule. Child tasks use the AFTER clause to define dependencies. The WHEN clause provides conditional execution based on stream data availability or custom conditions. Task history is queryable through built-in INFORMATION_SCHEMA views.

-- SSIS Master Package with 4 child packages:
--   1. Extract.dtsx (load raw files)
--   2. Validate.dtsx (data quality checks)
--   3. Transform.dtsx (business logic)
--   4. Publish.dtsx (build gold tables)
-- SQL Agent Job runs at 6 AM daily

-- Snowflake Task DAG equivalent
-- Root task: triggers the pipeline on schedule
CREATE OR REPLACE TASK etl.daily_pipeline_extract
    WAREHOUSE = etl_wh_medium
    SCHEDULE = 'USING CRON 0 6 * * * America/New_York'
    WHEN SYSTEM$STREAM_HAS_DATA('bronze.raw_data_stream')
AS
    CALL etl.extract_daily_data(CURRENT_DATE());

-- Child task 1: validation (runs after extract succeeds)
CREATE OR REPLACE TASK etl.daily_pipeline_validate
    WAREHOUSE = etl_wh_small
    AFTER etl.daily_pipeline_extract
AS
    CALL etl.validate_daily_data(CURRENT_DATE());

-- Child task 2: transformation (runs after validation succeeds)
CREATE OR REPLACE TASK etl.daily_pipeline_transform
    WAREHOUSE = etl_wh_large
    AFTER etl.daily_pipeline_validate
AS
    CALL etl.transform_daily_data(CURRENT_DATE());

-- Child task 3: publish (runs after transformation succeeds)
CREATE OR REPLACE TASK etl.daily_pipeline_publish
    WAREHOUSE = etl_wh_medium
    AFTER etl.daily_pipeline_transform
AS
    CALL etl.publish_gold_tables(CURRENT_DATE());

-- Enable tasks (bottom-up, as required by Snowflake)
ALTER TASK etl.daily_pipeline_publish RESUME;
ALTER TASK etl.daily_pipeline_transform RESUME;
ALTER TASK etl.daily_pipeline_validate RESUME;
ALTER TASK etl.daily_pipeline_extract RESUME;

-- Monitor execution (replaces SSISDB execution reports)
SELECT *
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
    SCHEDULED_TIME_RANGE_START => DATEADD('day', -1, CURRENT_TIMESTAMP()),
    TASK_NAME => 'DAILY_PIPELINE_EXTRACT'
))
ORDER BY SCHEDULED_TIME DESC;

Incremental Loading: SSIS Watermark Patterns to Snowflake Streams

SSIS incremental loading typically relies on watermark columns (modified_date, rowversion) stored in control tables, with Execute SQL Tasks reading the last watermark and Data Flow Tasks filtering on the watermark range. This requires manual bookkeeping, careful error handling to ensure watermarks are only updated after successful loads, and custom restart logic for partial failures.

Snowflake Streams provide automatic change data capture (CDC) on any table. A stream tracks inserts, updates, and deletes since the last consumption, eliminating the need for watermark columns, control tables, and all the associated fragile ETL logic.

-- SSIS incremental pattern (conceptual):
-- 1. Execute SQL: SELECT MAX(modified_date) FROM control.watermarks
-- 2. Store result in User::LastWatermark
-- 3. Data Flow: SELECT * FROM orders WHERE modified_date > ?
-- 4. Execute SQL: UPDATE control.watermarks SET last_value = GETDATE()

-- Snowflake Streams: automatic CDC, no watermark management
CREATE OR REPLACE STREAM bronze.orders_stream
    ON TABLE bronze.raw_orders
    SHOW_INITIAL_ROWS = FALSE;

-- The stream automatically tracks changes; consume with MERGE
CREATE OR REPLACE TASK silver.incremental_orders
    WAREHOUSE = etl_wh
    SCHEDULE = 'USING CRON */15 * * * * America/New_York'
    WHEN SYSTEM$STREAM_HAS_DATA('bronze.orders_stream')
AS
    MERGE INTO silver.orders t
    USING (
        SELECT *,
            METADATA$ACTION AS stream_action,
            METADATA$ISUPDATE AS is_update
        FROM bronze.orders_stream
    ) s
    ON t.order_id = s.order_id
    WHEN MATCHED AND s.stream_action = 'INSERT' AND s.is_update = TRUE
        THEN UPDATE SET
            t.quantity = s.quantity,
            t.unit_price = s.unit_price,
            t.status = s.status,
            t.updated_at = CURRENT_TIMESTAMP()
    WHEN MATCHED AND s.stream_action = 'DELETE'
        THEN DELETE
    WHEN NOT MATCHED AND s.stream_action = 'INSERT'
        THEN INSERT (order_id, product_id, customer_id, quantity,
                     unit_price, status, created_at, updated_at)
        VALUES (s.order_id, s.product_id, s.customer_id, s.quantity,
                s.unit_price, s.status, CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP());

ALTER TASK silver.incremental_orders RESUME;

From parsed legacy code to production-ready modern equivalents — MigryX automates the entire conversion pipeline

From Legacy Complexity to Modern Clarity with MigryX

Legacy ETL platforms encode business logic in visual workflows, proprietary XML formats, and platform-specific constructs that are opaque to standard analysis tools. MigryX’s deep parsers crack open these proprietary formats and extract the underlying data transformations, business rules, and data flows. The result is complete transparency into what your legacy code actually does — often revealing undocumented logic that even the original developers had forgotten.

Dynamic Tables: Replacing Simple SSIS Packages Entirely

Many SSIS packages exist solely to run a SQL query on a schedule and write the results to a target table. These packages typically contain a single Data Flow Task with an OLE DB Source (running a query) and an OLE DB Destination. Snowflake Dynamic Tables replace this entire pattern with a single declarative SQL definition that Snowflake keeps automatically refreshed.

-- SSIS Package: "Refresh Active Customer Summary"
-- Contains: One Data Flow Task
--   OLE DB Source: SELECT query with joins and aggregations
--   OLE DB Destination: truncate and reload target table
-- SQL Agent: runs every hour

-- Snowflake Dynamic Table: single definition, auto-refreshed
CREATE OR REPLACE DYNAMIC TABLE gold.active_customer_summary
    TARGET_LAG = '1 hour'
    WAREHOUSE = etl_wh
AS
    SELECT
        c.customer_id,
        c.customer_name,
        c.segment,
        c.region,
        COUNT(DISTINCT o.order_id) AS total_orders,
        SUM(o.quantity * o.unit_price) AS lifetime_revenue,
        AVG(o.quantity * o.unit_price) AS avg_order_value,
        MAX(o.order_date) AS most_recent_order,
        DATEDIFF('day', MAX(o.order_date), CURRENT_DATE()) AS days_since_last_order,
        CASE
            WHEN DATEDIFF('day', MAX(o.order_date), CURRENT_DATE()) <= 30
                THEN 'Active'
            WHEN DATEDIFF('day', MAX(o.order_date), CURRENT_DATE()) <= 90
                THEN 'At Risk'
            ELSE 'Churned'
        END AS activity_status
    FROM silver.customers c
    JOIN silver.orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.customer_name, c.segment, c.region;

Dynamic Tables eliminate the need for scheduling, error handling, and retry logic for simple transformation patterns. For SSIS packages that are essentially "run a query and write to a table," a Dynamic Table is the most elegant replacement — zero orchestration, zero maintenance, automatic incremental refresh.

Error Handling and Logging: Event Handlers to Exception Blocks

SSIS provides Event Handlers (OnError, OnWarning, OnPostExecute) and Logging Providers (SQL Server, Windows Event Log, text files) for monitoring package execution. Error rows in Data Flows can be redirected to error outputs for separate handling. In Snowflake, stored procedures support structured exception handling with BEGIN...EXCEPTION...END blocks, and Tasks provide automatic retry and failure notification through error integrations.

-- Error handling in Snowflake stored procedures
-- Replaces SSIS OnError event handler + error row redirection
CREATE OR REPLACE PROCEDURE etl.safe_daily_load(run_date DATE)
RETURNS VARIANT
LANGUAGE SQL
EXECUTE AS CALLER
AS
DECLARE
    error_msg STRING;
    rows_loaded INTEGER DEFAULT 0;
    result VARIANT;
BEGIN
    BEGIN
        -- Attempt the load
        CALL etl.load_daily_sales(:run_date);
        rows_loaded := SQLROWCOUNT;

        -- Log success (replaces SSIS OnPostExecute event handler)
        INSERT INTO audit.execution_log
            (procedure_name, run_date, status, rows_affected, completed_at)
        VALUES
            ('daily_sales', :run_date, 'SUCCESS', :rows_loaded,
             CURRENT_TIMESTAMP());

    EXCEPTION
        WHEN OTHER THEN
            error_msg := SQLERRM;

            -- Log failure (replaces SSIS OnError event handler)
            INSERT INTO audit.execution_log
                (procedure_name, run_date, status, error_message, completed_at)
            VALUES
                ('daily_sales', :run_date, 'FAILED', :error_msg,
                 CURRENT_TIMESTAMP());

            -- Re-raise for Task-level error handling
            RAISE;
    END;

    result := OBJECT_CONSTRUCT(
        'status', 'SUCCESS',
        'run_date', :run_date,
        'rows_loaded', :rows_loaded
    );
    RETURN result;
END;

Time Travel: A Capability SSIS Cannot Provide

Snowflake Time Travel allows querying historical data states without any ETL-managed snapshot tables. This capability has no SSIS equivalent — in SSIS environments, point-in-time data recovery requires full database backup-and-restore operations or manually maintained history tables with slowly changing dimension patterns.

-- Query data as it existed 2 hours ago (no SSIS equivalent)
SELECT * FROM warehouse.fact_sales
AT (OFFSET => -7200);

-- Query data before a specific statement was run
SELECT * FROM warehouse.fact_sales
BEFORE (STATEMENT => '01abc2de-0003-4567-0000-00000000789b');

-- Undo an accidental truncate (no SSIS equivalent)
CREATE OR REPLACE TABLE warehouse.fact_sales
AS SELECT * FROM warehouse.fact_sales
BEFORE (STATEMENT => LAST_QUERY_ID(-1));

Semi-Structured Data: Script Tasks to VARIANT and FLATTEN

In SSIS, processing JSON or XML data typically requires Script Tasks with C# parsing libraries (Newtonsoft.Json, System.Xml.Linq) or third-party components. Snowflake handles semi-structured data as a first-class citizen through the VARIANT data type, PARSE_JSON(), FLATTEN(), and dot-notation path traversal — all in standard SQL.

-- SSIS: Script Task with Newtonsoft.Json to parse API response
-- Snowflake: Native VARIANT handling, no external libraries

-- Ingest JSON directly from stage
COPY INTO bronze.api_responses (payload, loaded_at)
FROM (
    SELECT $1, CURRENT_TIMESTAMP()
    FROM @api_stage/responses/
)
FILE_FORMAT = (TYPE = 'JSON');

-- Flatten nested JSON (replaces C# parsing logic)
SELECT
    r.payload:order_id::INTEGER AS order_id,
    r.payload:customer.name::STRING AS customer_name,
    r.payload:customer.email::STRING AS customer_email,
    item.value:sku::STRING AS sku,
    item.value:quantity::INTEGER AS quantity,
    item.value:unit_price::FLOAT AS unit_price
FROM bronze.api_responses r,
    LATERAL FLATTEN(input => r.payload:items) item
WHERE r.payload:status::STRING = 'confirmed';

Migration Complexity: What MigryX Automates

SSIS DTSX packages are complex XML documents with deeply nested structures. A single package can contain multiple Data Flows, each with dozens of transformations, custom expressions in SSIS expression language, and variable-based routing across container hierarchies. Manually translating these to Snowflake is error-prone and time-consuming, particularly for organizations with hundreds of packages accumulated over years of development.

MigryX uses AST-based deterministic parsing to analyze DTSX package XML at the structural level, not through fragile regex patterns or AI-only approaches. The parser builds a complete abstract syntax tree of each package, extracting Control Flow task sequences, Data Flow transformation graphs, expression trees, variable references, and connection configurations. This approach achieves over 95% parser accuracy across SSIS 2012 through 2022 package formats, including complex packages with nested containers, event handlers, and custom Script Tasks.

MigryX Differentiators for SSIS Migration
AST-based deterministic parsing — parses DTSX XML into a complete abstract syntax tree, capturing every transformation, expression, and data path. No regex matching or pattern heuristics.
+95% parser accuracy — handles complex packages with nested containers, event handlers, and custom script tasks that regex-based tools miss entirely.
Column-level lineage — traces data flow from source columns through every transformation to target columns, generating complete STTM (Source-to-Target Mapping) documentation automatically.
Multi-target output — generates Snowflake SQL, Snowpark Python, and Task DAG definitions from a single DTSX analysis. The same package analysis can target multiple output formats.
On-premise and air-gapped deployment — runs entirely within your network. No DTSX files, no database schemas, no data leaves your environment. Critical for regulated industries and government organizations.
STTM documentation — automatically produces comprehensive Source-to-Target Mapping documents covering every column, transformation rule, and data lineage path across the entire SSIS estate.
Merlin AI assistant — provides interactive guidance for complex conversion decisions, such as choosing between Snowpark Python and SQL for specific transformations, translating C# Script Task code to Python, and suggesting Snowflake-native optimizations like Dynamic Tables.

Migration Comparison: SSIS Operational Model vs. Snowflake

Dimension	SSIS	Snowflake
Compute	Dedicated Windows servers or Azure SSIS IR	Elastic virtual warehouses, auto-suspend/resume
Scheduling	SQL Server Agent / Azure Data Factory	Snowflake Tasks with CRON expressions
Incremental Load	Manual watermark tables, custom logic	Streams with automatic CDC
Error Handling	Event handlers, logging providers	TRY/CATCH, TASK_HISTORY, QUERY_HISTORY
File Ingestion	Flat File Source, custom Script Tasks	Stages, COPY INTO, Snowpipe
Semi-structured Data	Script Task with JSON parsing libraries	Native VARIANT, FLATTEN, PARSE_JSON
Parameterization	Package parameters, environment variables	Procedure params, session variables
Deployment	ISPAC to SSIS Catalog, environment configs	SQL scripts, Git-based deployment
Point-in-Time Recovery	Database backups only	Time Travel (up to 90 days)
Cost Model	Fixed infrastructure (servers, licenses)	Pay-per-second compute consumption
Data Locality	Data moves through SSIS buffers to target	Compute runs where data lives
Maintenance	OS patching, SSIS updates, driver upgrades	Fully managed, zero maintenance

Key Takeaways
Every SSIS Control Flow task has a Snowflake equivalent: Execute SQL becomes stored procedures, For Each Loop becomes Snowflake Scripting loops, Script Tasks become Snowpark Python UDFs and procedures.
SSIS Data Flow transformations — Derived Column, Conditional Split, Lookup, Aggregate — map directly to SQL expressions and JOINs or Snowpark DataFrame operations.
Connection Managers are replaced by Stages, Storage Integrations, and File Formats that provide secure, managed cloud storage connectivity without server-side credential management.
SSIS variables become session variables or stored procedure parameters, with the same scoping and type safety capabilities.
Snowflake Streams eliminate the manual watermark-based incremental loading patterns that make SSIS packages fragile and difficult to restart after failures.
Dynamic Tables can replace entire SSIS packages that simply run a query and write results on a schedule — zero orchestration, automatic refresh.
MigryX automates DTSX package analysis using AST-based parsing with over 95% accuracy, generating Snowflake SQL, Snowpark Python, and Task DAGs with complete column-level lineage documentation.

Migrating from SSIS to Snowflake is not a lift-and-shift — it is an architectural transformation that eliminates the dedicated ETL server, replaces buffer-based pipelines with set-based SQL operations, and consolidates scheduling, CDC, and error handling into a single platform. The result is lower cost (pay only for compute seconds used), lower maintenance (no servers to patch), faster execution at any data volume, and a unified platform for storage, compute, transformation, and orchestration. For organizations with large SSIS estates, automated tooling like MigryX is essential to maintain velocity and accuracy through the migration.

Why MigryX Is the Only Platform That Handles This Migration

The challenges described throughout this article are exactly what MigryX was built to solve. Here is how MigryX transforms this process:

Deep AST parsing: MigryX’s custom-built parsers achieve 95% accuracy on every supported legacy technology — not through approximation, but through true semantic understanding.
Merlin AI augmentation: Where deterministic parsing reaches its limit, Merlin AI resolves ambiguities and implicit behaviors, pushing accuracy to 99%.
Complete coverage: MigryX supports 25+ source technologies including SAS, Informatica, DataStage, SSIS, Alteryx, Talend, ODI, Teradata, and Oracle PL/SQL.
End-to-end automation: From parsing to conversion to validation — MigryX automates the entire pipeline, not just one step.

MigryX combines precision AST parsing with Merlin AI to deliver 99% accurate, production-ready migration — turning what used to be a multi-year manual effort into a streamlined, validated process. See it in action.

Ready to migrate from SSIS to Snowflake?

See how MigryX converts DTSX packages to production-ready Snowflake SQL, Snowpark pipelines, and Task DAGs with complete column-level lineage.

Explore Snowflake Migration Schedule a Demo