DataFlow AI - The AI-native data integration platform.

This is the full story of how a Polkomtel analyst's hand-built Alteryx workflow — a chain of colourful tool boxes that has lived on one person's laptop for years — becomes a governed, cloud-based DataFlow AI pipeline. We follow every screen and every decision, explained so that someone who has never written a line of code can follow along.

Meet the people and the problem

Let us introduce Joanna. Joanna works in marketing analytics at Polkomtel Plus. She is not a software engineer; she is an analyst who needs to combine data from spreadsheets and databases to answer business questions. For years her favourite tool has been Alteryx Designer — a program where you build a data process by dragging little coloured tool icons onto a canvas and joining them with lines. No code, just boxes and arrows.

Alteryx is wonderful for one person experimenting at their desk. But it has grown into a quiet problem for Polkomtel:

Each workflow is a .yxmd file that lives on an analyst's laptop or a shared drive. There is no central repository, no version control, no lineage — no record of who changed what, or where the data came from.
The workflows are not promotable to production. They run when Joanna opens Alteryx and clicks Run. If Joanna is on holiday, the report does not happen.
Polkomtel pays around $200,000 a year for Alteryx licences, on top of the much larger Informatica bill.

Polkomtel has 50 to 100 Alteryx workflows like Joanna's, and wants them all moved onto DataFlow AI — the same modern, governed, cloud platform the Informatica workflows are migrating to. There they become scheduled, versioned, monitored pipelines that run on their own. The Migration Center inside DataFlow AI does most of that conversion automatically.

In plain terms

A "migration" simply means moving something to a new home. We are moving the instructions — Joanna's chain of tool boxes — from Alteryx into DataFlow AI. The spreadsheets and databases the workflow reads and writes do not move; only the tool that drives the process changes.

What an Alteryx workflow actually is

Before Joanna can move a workflow, it helps to know what one is built from. An Alteryx workflow is far flatter and simpler than an Informatica one — there are no nested sessions or mappings, just a single canvas.

Alteryx term	Everyday analogy	What it really is
Workflow (`.yxmd` file)	One assembly line drawn on paper	The whole process — a set of tools wired together
Tool	One station on the assembly line	A single processing step — read a file, filter rows, do a calculation, join, summarise
Tool ID	The station's number badge	A unique number Alteryx gives every tool so connections know what links to what
Configuration	The dials and settings on that station	The tool-specific settings — which column to filter on, what formula to apply
Annotation	A sticky note on the station	The friendly label the analyst typed for the tool
Connection	The conveyor belt between two stations	A line carrying data from one tool's output to the next tool's input
Macro (`.yxmc` file)	A pre-packaged mini-machine bolted into the line	A reusable sub-workflow saved as a separate file and dropped into bigger workflows

So an Alteryx workflow is just a flat set of tools joined by connections. A .yxmd file is actually XML underneath — structured text the Migration Center can read. DataFlow AI uses the same idea but calls the tools nodes and the connections edges, assembled into a pipeline.

Inside the .yxmd, the Migration Center expects this shape:

AlteryxDocument
 ├─ Properties → MetaInfo → Name        (the workflow name, optional)
 ├─ Nodes
 │   └─ Node  (each has a ToolID)
 │        ├─ GuiSettings  (Plugin="…Filter.Filter" — tells us the tool type)
 │        └─ Properties
 │             ├─ Configuration         (the tool's settings)
 │             └─ Annotation → Name      (the friendly label)
 └─ Connections
      └─ Connection → Origin / Destination

The Migration Center works out what kind of tool each node is by reading its Plugin attribute and taking the last word: AlteryxBasePluginsGui.Filter.Filter simply means a Filter tool.

The example we will follow: `EksportujDoBazyWsparcia.yxmd`

Joanna's first migration is a real Polkomtel workflow: EksportujDoBazyWsparcia.yxmd, built in Alteryx 2022.3. The name means roughly "export to the support database."

In plain words, this workflow handles courier and logistics (the "BDL" shipment data). It reads from an Excel spreadsheet and a Teradata database, runs the data through about 50 tools arranged in 4 parallel streams, removes duplicates across the different sources, routes rows depending on the environment, and writes the results out to MSSQL and Teradata.

It is rated Medium-High complexity, and the Migration Center expects roughly 78%–82% automatic conversion. The remaining 18%-or-so is mostly custom macros — .yxmc files like CountRecords.yxmc, Log_BDL_LH.yxmc, and IdImportu.yxmc — plus one stored procedure. We will see exactly how those get handled.

Step 1 — Locate and prepare the .yxmd file

Joanna does not need to "export" anything — the .yxmd file is the workflow. She simply finds EksportujDoBazyWsparcia.yxmd on the shared drive where it lives.

Two practical points:

The file must end in .yxmd — that is how the Migration Center recognises an Alteryx workflow. (Alteryx macros are saved as .yxmc files; those cannot be uploaded directly and are handled differently — see Step 7.)
The file must be 50 MB or smaller. Workflow definitions are small text files, so this is rarely a concern.

If the workflow relies on environment-parameterising macros — small macros that just feed in things like a database name or a date — Polkomtel's plan is to exclude those from the automatic migration and handle the parameters natively in DataFlow AI instead.

In plain terms

Joanna is not changing her Alteryx workflow at all. She is handing the Migration Center a copy of the blueprint. The original .yxmd keeps working on her laptop, exactly as before, until the new pipeline has been proven trustworthy.

Step 2 — Upload the file to the Migration Center

Joanna's engineering colleague Marek does the upload (creating pipelines needs the Data Engineer role). He logs in to DataFlow AI at https://dataflow.polkomtel.internal with his normal Polkomtel single sign-on, clicks Migration in the left sidebar, and lands on the Import Wizard.

He drags EksportujDoBazyWsparcia.yxmd onto the drop zone. The Migration Center immediately runs a few safety checks:

Is the filename present and the extension allowed? It must be .xml, .yxmd, or .dtsx. .yxmd is on the list, so the file passes.
Is the file under 50 MB? Yes.
It creates a tracked migration job, marks it UPLOADED, saves the raw file securely, and writes the job record to disk so nothing is lost.

From this one upload the Migration Center runs the entire conversion pipeline automatically. Marek just watches the job status change through these stages:

UPLOADED → PARSING → PARSED → CONVERTING → VALIDATING → VALIDATED
         → COMPLETED  |  COMPLETED_WITH_WARNINGS  |  FAILED

The next sections explain what each stage does.

Step 3 — Parsing: reading the canvas into a common language

The job status changes to PARSING. The Migration Center picks the Alteryx Parser and reads the .yxmd file.

The parser walks every Node in the file and sorts each tool into one of three buckets:

Input tools (such as AlteryxInput, DbFileInput, FileInput) become sources — they pull in the file path, connection, and table.
Output tools (such as AlteryxOutput, DbFileOutput, FileOutput) become targets.
Everything else becomes a transformation — a processing step.

For each tool it also reads the right details depending on the tool's type:

A Formula tool's calculation rows (each: field, expression, type).
A Summarize tool's summary rows (each: field, action).
A Filter tool's keep-condition text.
A Select tool's chosen fields.

Finally it reads every Connection and records it as an edge — an arrow — linking the right tools by their Tool ID.

The result is a tool-neutral internal description of the workflow. This is the clever part: PowerCenter, Alteryx, SSIS, and DataStage all get translated into the same internal shape, so DataFlow AI only needs one conversion engine afterwards.

If the .yxmd is malformed, the job stops here as FAILED with a clear message. For Joanna's valid file, parsing succeeds and the status becomes PARSED.

In plain terms

Parsing is like someone carefully reading Joanna's hand-drawn assembly-line diagram and re-drawing it onto a clean, standard template. The process has not changed — only the way it is written down.

Step 4 — Converting: turning each Alteryx tool into a DataFlow node

The job status changes to CONVERTING, and the Rule Engine now produces the equivalent DataFlow AI pipeline. It tries three layers, in order:

A precise rule. Most common Alteryx tools have a dedicated, hand-written conversion. These are predictable and trustworthy — the node is labelled with the conversion source rule_engine.
A generic mapping. Some tools get the correct DataFlow type but, if their confidence is high enough, are carried across with a generic configuration rather than a full translation.
The AI assistant (LLM). Only genuinely unusual tools — confidence below 0.60 — are handed to an actual AI model (Anthropic's Claude). Those nodes are labelled llm.

That label tells the reviewer instantly how each node was produced.

The tool-by-tool mapping table

Here is the heart of the conversion — how each Alteryx tool becomes a DataFlow node. Almost every conversion turns a visual tool into a piece of SQL, the standard language databases speak, which is what makes the result fast and cloud-ready.

Alteryx tool	What it does, in plain words	Becomes (DataFlow node type)	Confidence	What gets built
Input Data (`AlteryxInput`, `DbFileInput`, `FileInput`)	Reads data in from a file or database	`source`	0.90	A source definition with the file, connection, and table
Output Data (`AlteryxOutput`, `DbFileOutput`, `FileOutput`)	Writes the finished data out	`sink`	0.90	A target definition
Select	Keeps, drops, or renames columns	`column_select`	0.95	The list of kept fields
Filter	Drops rows that fail a condition	`sql_where`	0.95	The condition becomes a SQL `WHERE` clause
Formula	Calculates new columns	`sql_expression`	0.80	A field-to-expression map, translated to SQL
Summarize	Totals, counts, averages per group	`sql_group_by`	0.90	"Group By" fields become `GROUP BY`; the rest become aggregates
Join	Combines two streams into one	`sql_join`	0.90	The join type (default `INNER`) and the join fields
Union	Stacks several streams into one	`sql_union_all`	0.95	A `UNION ALL` with the union mode
Sort	Puts rows in order	`sql_order_by`	0.98	A SQL `ORDER BY` on the chosen fields
Sample	Keeps only the first N rows	`sql_limit`	0.85	Generic config
Unique	Removes duplicate rows	`sql_distinct`	0.90	Generic config
CrossTab	Turns tall data into a wide table	`sql_pivot`	0.75	Generic config
Transpose	Turns a wide table into tall data	`sql_unpivot`	0.75	Generic config
RegEx	Pattern-matches and extracts from text	`regex_transform`	0.70	Generic config
DateTime	Reformats and parses dates	`datetime_transform`	0.85	Generic config
Multi-Field Formula	Applies one formula across many columns	`multi_field_expression`	0.75	Generic config
Multi-Row Formula	A formula that looks at the row above/below	`window_function`	0.65	Generic config — review recommended
Find Replace	Looks up and substitutes values	`string_replace`	0.80	Generic config
Text To Columns	Splits one text column into several	`split_column`	0.80	Generic config
Generate Rows	Creates rows out of nothing (e.g. a date series)	`row_generator`	0.70	Generic config
Macro	A reusable sub-workflow (`.yxmc`)	`custom_transform`	0.40	Below 0.60 — handed to the AI assistant
Python Tool	Custom Python code	`custom_transform`	0.35	Handed to the AI assistant
R Script	Custom R statistical code	`custom_transform`	0.30	Handed to the AI assistant
Run Command	Runs an external program	`custom_transform`	0.30	Handed to the AI assistant

In plain terms

The high-confidence tools at the top of the table — Input, Select, Filter, Formula, Join, Summarize, Sort, Union, Output — are the everyday "bread and butter" of any Alteryx workflow. These convert by trusted rule, with no fuss. It is only the special tools — macros, Python, R — that need a human's attention.

What converts automatically vs. what needs review

For EksportujDoBazyWsparcia.yxmd, the picture looks like this:

Converts automatically (the ~80%) — the four parallel streams are built mostly from Input, Select, Filter, Formula, Join, Summarize, Sort, and Union tools. Every one of these has a precise rule, converts as rule_engine, and scores 0.80 or above. The Excel and Teradata inputs become source nodes; the MSSQL and Teradata outputs become sink nodes (both at a fixed 0.95 confidence).

Needs a human (the ~20%) — the workflow uses custom macros that must be inspected and rewritten by hand, because a macro's logic lives inside its own .yxmc file, which the engine cannot see from the .yxmd alone:

Six copies of CountRecords.yxmc — a row-counting macro.
Six copies of Log_BDL_LH.yxmc — a logging macro.
One copy of IdImportu.yxmc — an import-ID macro.
One plkPrzesylki_Aktualizacja_BDL stored procedure that runs as a Post-SQL step on a database.

Each macro tool is converted to a custom_transform placeholder at confidence 0.40 and routed to the AI assistant, which produces a starting skeleton. But a human must open the original .yxmc files, understand what they do, and finish the conversion properly.

Watch out

A custom .yxmc macro cannot be migrated by reading the workflow alone — its real logic is inside the macro's own file. The Migration Center is honest about this: it marks every macro node as custom_transform, scores it low, and flags it for manual work rather than guessing. Any node typed custom_transform automatically blocks an unattended release.

Step 5 — Confidence scoring and the 0.85 release gate

Every converted node gets a confidence score between 0 and 1 — the engine's honest estimate of how sure it is. A Sort scores 0.98; a Macro scores 0.40. The overall confidence is the average of all node scores.

Confidence controls whether the migration may finish on its own. This is the release gate, set at 0.85. After conversion, the Migration Center collects a list of blocking issues. The job is blocked if any of these is true:

The overall confidence is below 0.85.
Any single object converted below 0.80 confidence.
Any object is typed custom_transform, or carries unresolved issues.
Validation found a problem.

If the blocking list is empty, the job is marked COMPLETED. If it has even one entry, the job is marked FAILED — and the error message names exactly which gate tripped.

For EksportujDoBazyWsparcia.yxmd, the seven-plus macro nodes are all custom_transform. That alone blocks an unattended release, so this job is FAILED by the gate. That is not a problem — it is the system working correctly. It means "an engineer must finish the macro work before this ships." Marek expects this for a Medium-High workflow; only the simplest workflows sail straight through to COMPLETED.

Outcome bucket	What it means for the engineer
Fully Automatic (~58% of all workflows)	Converted and ready — under 30 minutes of review
AI-Assisted (~27%)	The AI generated it; review and approve — 2–4 hours
Manual Work (~12%)	Custom logic, macros, plugins — hands-on, 1–3 days
Not Supported (~3%)	No migration path; must be redesigned

Alteryx workflows as a group convert in the 75%–82% range. EksportujDoBazyWsparcia.yxmd lands at 78–82% automatic, with roughly one day of manual work for the macros and stored procedure.

In plain terms

Think of the gate like airport security for your data pipeline. The straightforward bags go straight through. Anything unusual — the macros here — is set aside for a human to open and check. The Migration Center would rather pause and ask than wave through something it does not fully understand.

Step 6 — Validation: six independent safety checks

While the job was VALIDATING, two layers of checks ran.

Structural checks (the Pipeline Validator) confirm the generated pipeline file is well formed:

YAML syntax — the file parses, and its top level is a proper structure.
Required keys — the pipeline has a name and a list of nodes.
Node schema — every node has a unique id and a valid type (source, transform, sink, or quality).
Edge references — every arrow points at nodes that exist.
SQL safety scan — every piece of SQL is scanned for dangerous commands (DROP TABLE, TRUNCATE TABLE, ALTER TABLE, and the like).
Connector names — each connector is a recognised type.

Business checks (the six-check suite) run when Marek opens the Validation Suite screen:

#	Check	Passes when
1	YAML syntax	The structural validator reports the pipeline is valid
2	SQL safety	No dangerous-SQL warnings were raised
3	Transform coverage	At least 50% of objects converted with confidence ≥ 0.80
4	Confidence score	Overall confidence is at least 0.60
5	Node completeness	The pipeline has at least one source and one sink
6	Mapping issues	No converted object carries an unresolved issue

For EksportujDoBazyWsparcia.yxmd, the structural checks pass and most business checks pass — but check 6 (mapping issues) will flag the unresolved macro nodes. That is the same message the release gate gave, restated as a validation result: finish the macros.

Step 7 — Reviewing the converted pipeline and finishing the macros

Marek opens the migration report on the AI Conversion screen. It lays out:

Total objects, split into auto-converted (confidence ≥ 0.8), needs-review (0.5–0.8), and manual-required (below 0.5).
A confidence distribution across bands from "high (90–100%)" down to "critical (0–39%)".
The full list of object mappings — each Alteryx tool type, its new DataFlow type, its confidence, and its conversion source (rule_engine or llm).
Validation issues and warnings.
An effort estimate: about 0.25 hours per automatic node, 1 hour to review each AI-assisted node, 2 hours per fully-manual node, plus testing and a fixed 4 hours of integration testing.
Manual-review items with tailored guidance — for a macro, the guidance is to re-implement it as a Python user-defined function or as inline SQL.

For EksportujDoBazyWsparcia.yxmd, the report shows the four parallel streams cleanly auto-converted, and the macros and stored procedure flagged as the manual work. Marek now does that work:

For each CountRecords.yxmc and Log_BDL_LH.yxmc macro, he opens the original .yxmc file in Alteryx, sees what it does (counting rows, writing a log entry), and re-builds that small piece of logic natively in DataFlow AI — as a quality check, a small SQL transform, or a Python UDF.
For IdImportu.yxmc, he does the same — recreates the import-ID logic.
For the plkPrzesylki_Aktualizacja_BDL stored procedure, he keeps the procedure on the database and configures the DataFlow pipeline to call it as a Post-SQL step, or rewrites it as inline SQL.
He maps the environment-aware routing (the workflow behaved differently in test vs. production) onto DataFlow AI's native parameter and environment features rather than a macro.

Watch out

Always check the conversion source column in the report. A node marked rule_engine came from a precise, tested rule — trust it at a glance. A node marked llm was the AI assistant's best effort — usually good, but a human must read it, understand it, and approve it. A custom_transform node is a placeholder that must be finished by hand before the pipeline ships.

Step 8 — Deploying the converted pipeline

Once the macros are finished and the report is clean, Marek downloads the result. On the report screen he clicks Download, chooses the YAML format, and receives a file named EksportujDoBazyWsparcia_converted.yaml.

That YAML file is the new pipeline — a clean, human-readable description of every source, transformation, and target:

pipeline:
  name: EksportujDoBazyWsparcia
  description: Converted from Alteryx Designer
  schedule: manual
  sources:
    - id: src_excel_bdl
      type: source
      connector: generic
    - id: src_teradata_bdl
      type: source
      connector: teradata
  transformations:
    - id: filter_active_shipments
      type: transform
      depends_on: [src_teradata_bdl]
    - id: join_bdl_streams
      type: transform
      depends_on: [filter_active_shipments, src_excel_bdl]
  targets:
    - id: sink_mssql_wsparcie
      type: sink
      connector: mssql
      depends_on: [join_bdl_streams]
  quality_checks:
    - type: row_count_check
    - type: null_check
    - type: duplicate_check
    - type: schema_validation

Notice the quality checks at the bottom — a row-count check, a null check, a duplicate check, and a schema check — added automatically as guardrails. (Because some nodes converted below 0.80, a data_reconciliation check is also added, naming the flagged transforms so they get extra scrutiny.)

Marek imports the YAML into a DataFlow AI workspace. The pipeline appears in the Design Studio as a visual diagram — boxes and arrows, much like the original Alteryx canvas Joanna knows. There he sets its real schedule (the report used to be run on demand, so the team agrees a daily cron schedule), fills in connections and parameters, and saves. Saving commits the pipeline to Git — so for the first time ever, Joanna's workflow is version-controlled, centrally stored, and runs on its own without anyone opening Alteryx.

Step 9 — Verifying the results with a parallel run

Marek does not simply switch off the old Alteryx workflow. The Polkomtel migration programme requires a parallel run.

For a minimum of two weeks, both versions run against the same input data: Joanna's original .yxmd in Alteryx, and the new DataFlow AI pipeline. DataFlow AI's Parallel Run & Data Parity Validation compares the two outputs across five dimensions:

Exact row count — same number of rows out?
Checksum — an MD5/SHA-256 fingerprint of all the data.
Column sampling — a random 1,000 rows compared column by column.
Null counts — empty-value counts per column must match.
Execution time — the new pipeline must finish within 2× the old one's time.

The tool produces a clear pass/fail report. Once EksportujDoBazyWsparcia clears Polkomtel's output-parity threshold for the full window, the team signs off, switches fully to DataFlow AI, and the Alteryx licence for that workflow can eventually be retired. If anything ever looks wrong, rolling back is a single configuration change.

In plain terms

A parallel run is like running the new machine right next to the old one for two weeks and checking, every single day, that they produce identical products. Only when the new machine has earned that trust does the old one get switched off.

The whole journey at a glance

Here is the complete trip for Joanna's Alteryx workflow:

Locate EksportujDoBazyWsparcia.yxmd — the .yxmd file is the workflow.
Upload it to the Migration Center's Import Wizard.
The Migration Center parses the .yxmd into a tool-neutral description.
The Rule Engine converts each tool into a DataFlow node — by precise rule, generic mapping, or AI assistant.
Every node gets a confidence score; the 0.85 release gate decides COMPLETED versus FAILED.
Validation runs six independent safety checks.
Marek reviews the report and finishes the macros by hand, re-implementing each as a Python UDF or inline SQL.
He downloads the converted YAML, imports it into a workspace, sets the schedule and connections, and deploys it.
A two-week parallel run proves the new pipeline matches the old one row-for-row, after which the Alteryx workflow is retired.

Multiply that across Polkomtel's 50–100 Alteryx workflows, and the company turns a scattered collection of laptop-bound .yxmd files into a governed, scheduled, version-controlled set of cloud pipelines — and retires a $200,000-a-year licence along the way.

Frequently asked questions

Why can't macros be migrated automatically? A macro's logic lives inside its own separate .yxmc file. The Migration Center reads the .yxmd workflow, which only references the macro by name — it cannot see inside it. So macros are flagged as manual work; an engineer opens the .yxmc, understands it, and rebuilds the logic natively.

My job said FAILED but the streams converted fine — what happened? FAILED means the release gate tripped — it needs a human. For an Alteryx workflow with macros, the custom_transform placeholder nodes automatically block an unattended release. Finish the macros, and the gate will pass.

Does the data move during a migration? No. Only the instructions move from Alteryx into DataFlow AI. The Excel files, the Teradata and MSSQL databases — all stay exactly where they were.

What happens to Python or R tools in a workflow? Like macros, they are converted to custom_transform placeholders and sent to the AI assistant for a starting skeleton. An engineer then reviews and finishes them — typically as a Python UDF in DataFlow AI.

Will the new pipeline look familiar to the analyst? Yes. In the Design Studio the converted pipeline is shown as a visual canvas of boxes and arrows — the same drag-and-connect idea Joanna used in Alteryx — but now it is scheduled, monitored, governed, and version-controlled.

Meet the people and the problem

What an Alteryx workflow actually is

The example we will follow: EksportujDoBazyWsparcia.yxmd

Step 1 — Locate and prepare the .yxmd file

Step 2 — Upload the file to the Migration Center

Step 3 — Parsing: reading the canvas into a common language

Step 4 — Converting: turning each Alteryx tool into a DataFlow node

The tool-by-tool mapping table

What converts automatically vs. what needs review

Step 5 — Confidence scoring and the 0.85 release gate

Step 6 — Validation: six independent safety checks

Step 7 — Reviewing the converted pipeline and finishing the macros

Step 8 — Deploying the converted pipeline

Step 9 — Verifying the results with a parallel run

The whole journey at a glance

Frequently asked questions

The example we will follow: `EksportujDoBazyWsparcia.yxmd`