Feature guides

Migration Center

The Migration Center is the AI-powered tooling that converts Polkomtel's legacy ETL estate — Informatica PowerCenter, Alteryx, SSIS, and DataStage workflows — into native DataFlow AI YAML pipelines, with automated parsing, rule-based plus LLM-assisted conversion, per-object confidence scoring, and data-parity validation.


Who uses the Migration Center

The Migration Center is used primarily by the Data Engineer persona (Anna Kowalska) and the Platform Admin persona (Katarzyna Zielińska). Engineers drive day-to-day conversions; admins oversee the overall migration program.

For Polkomtel the scope is large — 500+ PowerCenter workflows plus 50–100 Alteryx workflows, totaling roughly 550–600 assets to migrate.

What it migratesSource fileEngine
Informatica PowerCenter.xmlrule engine + LLM fallback
Alteryx.yxmdrule engine + LLM fallback
SSIS.dtsxrule engine + LLM fallback
DataStage.dsxparser exists (rule-engine wired)

Note

The Migration Center migrates legacy ETL tools, not SQL dialects. The upload step accepts PowerCenter XML, Alteryx YXMD, and SSIS DTSX files; a DataStage parser also exists in the engine.


Module layout

The Migration Center mounts at /migration (redirecting to /migration/import; entry src/pages/MigrationCenter.tsx, layout src/pages/migration/MigrationLayout.tsx). A left sidebar lists the four screens; a breadcrumb trail sits at the top.

+------------------------------------------------------------------+
| Home > Migration Center > Import Wizard                           |
+------------------------------------------------------------------+
| Sidebar (240px)         | Main content area                      |
|  ARROW Migration Center | +-----------------------------------+  |
|   > Import Wizard        | |                                   |  |
|   > AI Conversion        | |   (Screen content renders here)   |  |
|   > Validation Suite     | |                                   |  |
|   > Progress Tracker     | |                                   |  |
+-------------------------+--+-----------------------------------+  |

The four screens:

ScreenRoutePurpose
Import Wizard/migration/importUpload, AI-analyze, and report on source workflow files
AI Conversion Dashboard/migration/conversionMonitor auto-conversion results by object type
Validation Suite/migration/validationRun data-parity tests, compare source vs target
Progress Tracker/migration/progressTrack migration phases, velocity, timeline, risks

Conversion pipeline — what happens under the hood

The migration engine runs a fixed lifecycle on every uploaded file: uploaded → parsing → parsed → converting → validating → validated → completed | completed_with_warnings | failed.

StageWhat it does
1. ParseA tool-specific parser turns the source file into a common WorkflowAST of mappings, transformations, and connectors.
2. ConvertA deterministic RuleEngine walks each transformation; 13 PowerCenter transform types have explicit conversion rules. Types with no rule, or confidence below 0.60, fall through to an LLM converter.
3. ValidateA PipelineValidator runs six checks on the generated YAML — syntax, required keys, node schema, edge references, and a dangerous-SQL scan.

Conversion is deterministic where possible. PowerCenter transform rules and their target nodes:

Source typeDataFlow targetConfidence
Source Qualifiersource connector SQL push-down0.90–0.95
Expressionsql_expression0.85
Lookup Proceduresql_join_pushdown0.65–0.85
Aggregatorsql_group_by0.95
Filtersql_where0.98
Joinersql_join0.90
Sortersql_order_by0.98
Routerconditional_branch (CASE WHEN)0.85
Update Strategyupsert_strategy0.80
Unionsql_union_all0.95

The release gate: a job is marked completed only if overall confidence is at or above 0.85, no object scores below 0.80, no object needs manual review, and there are no validation issues — otherwise it is failed.

Note

Every AI output in the Migration Center carries a 0–1 confidence score. The engine distinguishes genuine LLM output from failure fallbacks: when the LLM is unavailable, conversions return conversion_source = "llm_unavailable", confidence 0.0, and requires_manual_review = true.


Screen 1 — Import Wizard

Route: /migration/import — entry src/pages/migration/ImportWizardPage.tsx.

A four-step wizard with a horizontal step indicator.

+------------------------------------------------------------------+
|  (1) Upload --- (2) AI Analysis --- (3) Report --- (4) Convert    |
+------------------------------------------------------------------+

Step 1 — Upload Files

A full-width drag-and-drop zone (dashed border, hover and drag-over states) with a Browse Files fallback. Each dropped file auto-detects its type — PowerCenter XML (orange badge), Alteryx Workflow (blue badge), or Unsupported (red badge) — and a quick parse reports the object count ("Detected: 7 mappings, 42 transformations"). Each file is shown as a card with its icon, name, size, type badge, an upload progress bar, and a Remove button. When more than one file is added, a batch indicator notes that all files will be analyzed together. The Analyze with AI button activates once at least one file is ready.

Step 2 — AI Analysis

A full-width overall progress bar plus a four-stage vertical checklist — Parsing Source Files → Analyzing Objects → Checking Compatibility → Generating Report — each stage card showing pending / in-progress (spinner) / completed (green check) state. A live, scrolling, terminal-styled Analysis Feed streams color-coded messages (info/success/warning/error) as the engine works through each workflow. The wizard auto-advances to Step 3 when analysis reaches 100%.

Step 3 — Compatibility Report

Four summary cards across the top — Total Objects, Auto-Convertible (count and %), Manual Required (count and %), and Estimated Effort (hours). Below them:

  • Workflow Assessment table — one row per workflow with a complexity badge (Low / Medium / Medium-High / High / Very High), object count, auto-convert %, estimated effort hours, and source/target systems.
  • Object Type Breakdown — a horizontal stacked bar chart, auto-convertible portion in indigo, manual portion in amber; object types under 50% auto-convert are flagged red.
  • Risk Items panel — collapsible severity-coded cards (high/medium/low), each with a description, the affected workflow, and an expandable recommendation.

A Start AI Conversion button moves to Screen 2.

Behind the scenes

The frontend api/migration.ts posts the multipart upload to the migration-engine /upload endpoint, which runs all conversion stages inline. The engine's parsers use lxml for XML/YXMD and a proprietary text parser for DataStage; the max_upload_size_mb limit is 50 MB.


Screen 2 — AI Conversion Dashboard

Route: /migration/conversion — entry src/pages/migration/ConversionDashboardPage.tsx.

+------------------------------------------------------------------+
| [Total Objects 150] [Auto-Converted 127 (85%)] [Manual 23 (15%)] |
+------------------------------------------------------------------+
| Conversion Status by Object Type (table)                          |
+------------------------------------------------------------------+
| Converted Pipeline List (cards)   | Confidence Distribution chart |
+------------------------------------------------------------------+

Three summary cards lead the screen — Total Objects, Auto-Converted (with a green donut), Manual Required (with an amber donut) — plus average confidence.

The Conversion Status by Object Type table breaks results down per transform type (Source Qualifier, Expression, Lookup, Filter, Joiner, Custom Java, Router, Other): converted/total, conversion rate with an inline bar, average confidence, and a status of complete / partial / flagged. Flagged rows (such as Custom Java relying on proprietary Siebel CDMA libraries) get a red left-border, a red row tint, a warning icon, and a flag-reason line.

The Converted Pipeline List shows a card per converted pipeline — original → converted name (e.g. df_sap_biuro_sprzedazy_plk.yaml), a circular confidence badge (green ≥90, amber 70–89, red <70), object counts, source/target system pills, a status badge, and an Open in Design Studio link.

A Confidence Distribution histogram bins pipelines by confidence range (0-50%, 50-70%, 70-85%, 85-95%, 95-100%).

The Migration Center AI Conversion Dashboard showing an Alteryx workflow converted to a DataFlow AI YAML pipeline with a confidence score
The AI Conversion Dashboard after converting an Alteryx workflow — each converted pipeline carries a confidence badge and an Open in Design Studio link.

Alteryx workflows convert the same way PowerCenter does: the engine parses the .yxmd file into a common WorkflowAST, the rule engine maps each tool to a DataFlow node, and anything without a rule or below 0.60 confidence falls through to the LLM converter. Alteryx-specific constructs such as environment macros (GetEnvironmentVariable used for PROD/Test routing) are flagged for manual review rather than auto-converted, and surface as red-bordered rows in the Conversion Status table and as Risk Items in the Compatibility Report. A converted Alteryx pipeline shows a blue Alteryx source-type badge to distinguish it from the orange PowerCenter badge.

Behind the scenes

The conversion produces DataFlow AI YAML pipelines. The YamlGenerator classifies nodes into sources/transformations/targets, builds depends_on linkage from edges, and injects three default quality checks (row_count, null_percentage, duplicate) plus per-sink schema-validation checks and a low-confidence reconciliation check for any mapping under 0.80 confidence.


Screen 3 — Validation Suite

Route: /migration/validation — entry src/pages/migration/ValidationSuitePage.tsx.

The Validation Suite proves data parity between the legacy source and the converted DataFlow AI pipeline.

A Test Runner control bar at the top exposes Run All Tests and Re-run Failed, a status dot (idle / running / completed), and progress text. Four summary cards follow — Total Tests, Passed, Failed, Pass Rate.

The Pipeline Comparison Table lists each pipeline with its source and target systems, source vs target row counts, a row-count-match flag, a checksum-match flag, a column-diff count, execution time, and a pass/fail status. Failed rows are red-tinted with a red left-border and expand to reveal per-column failure detail — the column name, expected vs actual value, row index, and diff type (value_mismatch, null_mismatch, type_mismatch, or missing_row).

Behind the scenes

api/migration.ts calls the engine's /jobs/{id}/validate endpoint, which runs a six-check validation suite — YAML syntax, SQL safety (scanning for DROP TABLE/DATABASE, TRUNCATE, ALTER TABLE, EXEC, xp_cmdshell), transform coverage at or above 50%, confidence at or above 60%, source-and-sink node completeness, and mapping-issue checks.


Screen 4 — Migration Progress Tracker

Route: /migration/progress — entry src/pages/migration/ProgressTrackerPage.tsx.

The Progress Tracker gives a program-level view of the entire migration: the overall phases, conversion velocity, a timeline, and outstanding risks. It is the screen the Platform Admin uses to report migration status, with effort estimated by a person-hour model (0.25h base per mapping, 2.0h per manual review, 1.0h per LLM-assisted conversion, 0.5h testing, 4.0h integration).


Click-path — migrate a legacy Informatica workflow end-to-end

  1. Open /migration/import.
  2. Drag the PowerCenter export (e.g. wf_E112.XML) onto the drop zone. The card shows an orange PowerCenter XML badge and a detected object count.
  3. Click Analyze with AI. The wizard moves to Step 2; watch the four-stage checklist and the live analysis feed stream parsing, classification, and compatibility messages.
  4. When analysis completes, the wizard auto-advances to the Compatibility Report. Review the four summary cards, the per-workflow complexity badges, the object-type breakdown chart, and the Risk Items panel — expand high-severity items (such as Custom Java transformations needing a Python UDF rewrite) to read the recommendation.
  5. Click Start AI Conversion — the wizard navigates to /migration/conversion.
  6. On the AI Conversion Dashboard, inspect the Conversion Status by Object Type table. Flagged object types (red) need manual attention; partial and complete types are mostly automated.
  7. In the Converted Pipeline List, click a pipeline card's Open in Design Studio to inspect or fix the generated YAML pipeline.
  8. Go to /migration/validation and click Run All Tests to verify data parity.
  9. Review the Pipeline Comparison Table — expand any failed row to read the per-column failure detail, fix the conversion in Design Studio, then Re-run Failed.
  10. Track overall program status on /migration/progress.

Migration sub-page map

Sub-pageRoute
Import Wizard/migration/import
AI Conversion Dashboard/migration/conversion
Validation Suite/migration/validation
Migration Progress Tracker/migration/progress

API reference

ConcernEndpoint / module
Upload & inline conversionmigration-engine POST /upload via api/migration.ts
Job listGET /jobs
Job statusGET /jobs/{id}/status
Job reportGET /jobs/{id}/report
Download converted YAMLGET /jobs/{id}/download
Run validationPOST /jobs/{id}/validate
Re-trigger conversionPOST /jobs/{id}/convert

The migration-engine is a FastAPI service (port 8091) mounted under /api/v1/migration and /api/migration. It uses rule-based pattern matching for deterministic transforms and falls back to the Anthropic SDK for complex transformations. Converted YAML pipelines are consumed directly by Design Studio.

Previous
Admin Console