Introduction

Business value & ROI

DataFlow AI is an AI-native ETL/ELT platform built to replace Informatica PowerCenter and Alteryx Designer at Polkomtel Plus — removing legacy vendor lock-in, enabling real-time data integration, and closing GDPR/RODO compliance gaps. This page explains the business case: what hurts today, what changes, what it costs, and how success is measured. Where the source documents disagree on numbers, both figures are shown and the discrepancy is flagged.


What DataFlow AI is

DataFlow AI Platform is a cloud-native enterprise ETL/ELT platform — software that extracts data from source systems, transforms it (cleans, joins, aggregates), and loads it into a destination such as a data warehouse. "ELT" is the same idea with the transform step pushed down into the target database engine.

It is positioned as a single platform that combines:

  • The workflow automation of Informatica PowerCenter (batch ETL).
  • The self-service analytics of Alteryx Designer (analyst data prep).
  • Real-time streaming via change data capture (CDC).
  • AI-assisted development through an in-product Copilot.
  • Comprehensive data governance (lineage, quality, GDPR/RODO controls).

The vendor is auraliscode, a Warsaw-based data-integration company. The target client is Polkomtel Plus, a major Polish telecom operator. Under the engagement model, platform delivery is fixed-price and source-code ownership transfers to Polkomtel on completion — there is no permanent dependency on the vendor.

A note on jargon

ETL = Extract, Transform, Load. CDC (Change Data Capture) = streaming every insert/update/delete out of a database as it happens, instead of copying the whole table on a schedule. TCO = Total Cost of Ownership, the all-in cost over a multi-year period. DSAR = Data Subject Access Request, a person's legal request to see or delete their personal data.

Source-document discrepancies

This page draws on three source documents that do not fully agree with each other. The differences are material, so they are reported openly rather than averaged away:

TopicBusiness documentationRFI answer
Vendor legal entityauraliscode sp. z o.oauraliscode SA
Vendor contact domain.pl emails.eu emails
Software licensing model$0 (open-source based)$110K/yr platform license + consumption credits
3-Year TCO~$645K$1,420K–$1,770K
Alteryx licensing cost"~$200K/yr""$150K–$250K/yr"

Treat the cost sections below with this in mind: the two cost models are not reconcilable as written.


The legacy landscape at Polkomtel

Polkomtel Plus today runs two separate, ageing data-integration tools. Neither fits the company's stated cloud-first strategy.

Legacy toolEstateWhat it doesLicensing costKey problems
Informatica PowerCenter500+ active workflowsBatch ETL for business intelligence, the data warehouse, and regulatory reporting$1.2M–$2M/yrOn-premises only; manual version control; technical debt accumulated since 2008–2012; integrates SAP HANA, Oracle, Teradata, MSSQL
Alteryx Designer50–100 analyst workflows (.yxmd files)Self-service analytics and marketing data prep~$200K/yr (also cited as $150K–$250K/yr)Used by analysts, not engineers; workflows shared as files with no central repository; no version control or data lineage

"On-premises" means the software runs on servers Polkomtel owns and operates in its own data center, rather than in a cloud provider's data center.

Five strategic pain points

These are the concrete problems the platform is designed to solve.

  1. Vendor lock-in and licensing cost. Informatica is priced per-CPU and per-connector; the bill has grown roughly 15–20% per year. There is no exit path — the licensing cycle is perpetual.
  2. No real-time capabilities. PowerCenter is batch-only. Capturing changes from Oracle, PostgreSQL, MSSQL, MySQL, or MongoDB as they happen (CDC) requires separate, expensive Informatica PowerExchange licenses that Polkomtel does not currently hold.
  3. Slow development cycles. Building a new Informatica workflow takes 3–10 days. Alteryx workflows are ungoverned, file-based, and cannot be reliably promoted to production.
  4. GDPR / RODO compliance gaps. There is no automated detection of personal data (PII), no column-level lineage to answer data-subject requests, no automated retention enforcement, no tamper-proof audit logs, and no built-in data masking. ("RODO" is the Polish name for the GDPR.)
  5. Cloud migration blockers. PowerCenter's on-premises architecture conflicts with Polkomtel's GCP cloud-first strategy. Even Informatica's own cloud product (IICS/IDMC) is a costly re-architecture that does not solve the cost problem.

Why the RFI happened

The engagement originates from a Polkomtel RFI ("Request for Information") titled "POC ETL Tool for DWH Polkomtel". Three drivers were stated:

  • A. The data infrastructure is moving to the cloud and needs cloud-adapted integration tooling.
  • B. The current ETL tool's technical support and development are ending, which limits further operation under Polkomtel's operational and security requirements.
  • C. Greater flexibility, scalability, and automation are needed to keep up with growing data volumes and a wider variety of sources.

The objective: select a solution that enables secure, scalable, efficient data processing in both cloud and on-premise architectures with long-term operational stability.


Target users and their pain points

DataFlow AI is built for five user roles, each with a tailored dashboard and permission set. Sign-in is through Keycloak with single sign-on (SSO) into Active Directory / Azure AD. The five roles map to 26 granular permissions.

RoleHeadcount at PolkomtelWhat they doPain point removed
Admin (Platform Administrator)2–3 senior platform/DevOps engineersUser provisioning, connector and credential management, environment config, health monitoring, GCP cost management, incident responseManual incident response — the platform self-heals known failure patterns before a human is involved
Data Engineer (Pipeline Developer)5–15 engineersBuild and tune pipelines, migrate Informatica/Alteryx workflows, manage data-quality rules, configure CDC streamsRepetitive boilerplate and slow debugging — the AI Copilot diagnoses failures in plain language (e.g. explains an ORA-01555 Oracle error and its fix)
Data Analyst (Self-Service Analytics)Spread across marketing, revenue assurance, network planning, finance, complianceConsume engineer-prepared data, ask questions in natural language, schedule queries — they do not build ETLDependency on the central data-engineering team — natural-language-to-SQL lets analysts self-serve
Data Steward (Governance & Compliance)2–4 people across data domainsData-quality governance, GDPR/RODO compliance, catalog accuracy, DSAR responses, business-glossary upkeepSlow DSAR handling — responses drop from days of manual investigation to 5–15 minutes per request via the DSAR API
Viewer (Read-Only Stakeholders)Senior business stakeholders, audit/compliance officers, supervised external partnersVisibility only — no create or modify rightsLack of visibility — read-only access to dashboards, monitoring, and the catalog

The user interface is fully bilingual (Polish/English) across seven translation namespaces, and the AI Copilot accepts and answers in both languages. The proposed default is Polish for business users and English for engineering.


Business use cases

What Polkomtel will actually do with the platform:

  • Legacy ETL migration — automated conversion of 500+ Informatica workflows and 50–100 Alteryx workflows.
  • Real-time analytics enabled by CDC — live subscriber churn scoring while a customer browses the website, live network anomaly detection, and fraud signals on individual transactions, instead of once-daily batch sweeps.
  • Telecom CDR / ASN.1 processing — a purpose-built decoder that converts binary Call Detail Records (CDRs) emitted by network elements into Parquet files. This is cited as unique among commercial ETL platforms.
  • Regulatory reporting — E112 emergency-services call routing and reporting, UKE (Polish telecom regulator) reporting, and built-in handling of the Polish national identifier (PESEL).
  • Self-service analytics — analysts query data in plain language, e.g. "top 10 most profitable roaming destinations last quarter by revenue per subscriber."
  • Multi-tenant domain isolation — separate workspaces per business domain: Network Operations, Revenue Assurance, Marketing Analytics, Finance Reporting, Regulatory Compliance.
  • Net-new data products — after migration, the freed-up engineering capacity builds new things: real-time customer analytics, AI-powered network operations, advanced fraud detection.
  • Knowledge preservation — the migration program produces Polkomtel's first complete inventory of data-integration dependencies plus human-readable business-rule documentation, stored in the Data Catalog.

Migration economics

Migration is automated where possible and AI-assisted where it is not. The 500+ Informatica workflows fall into four categories by difficulty.

CategoryShare of estateEffort per workflowRisk
Fully Automatic58%< 30 min reviewZero migration risk
AI-Assisted27%2–4 hours (80–95% AI-generated)Combined with automatic = 85%+ coverage
Manual Migration12%1–3 daysEngineering involvement required
Re-Architecture Required3%3–10 daysCustom C++ plugins, PowerExchange mainframe CDC — a property of the legacy code, not a DataFlow AI limitation

The full migration program is estimated at roughly 2,545 person-hours across ~550–600 workflows, run by a team of 3–4 senior engineers over 6–8 months.

Validated workflow samples

Three real Polkomtel workflows were used to validate the migration engine. Note the two auto-conversion figures per row — the business documentation and the RFI answer agree on the percentage but express the remaining effort differently.

WorkflowToolComplexityAuto-conversionRemaining effort
wf_SAP_Replika_l_BIURO_SPRZEDAZY_PLKInformatica PCSimple (2.1/10)95%~30 min review (business doc) / 4–8 hrs (RFI)
wf_E112Informatica PCVery Complex (8.4/10)68–75%2–3 days (business doc) / 80–120 hrs (RFI)
EksportujDoBazyWsparcia.yxmdAlteryx 2022.3Medium-High (5.8/10)78–82%~1 day (business doc) / 24–40 hrs (RFI)

Risk mitigation

  • Parallel run — every migrated workflow runs simultaneously in both DataFlow AI and Informatica for a minimum two-week validation window. Rollback is a single configuration change.
  • Automated regression testing — both versions run against identical input and outputs are compared row by row.
  • Quality gates — tiered by category: parallel run of 10–30 business days, an output-parity threshold of 99.5%–100%, and sign-off escalating to the Data Protection Officer and Compliance Officer for regulatory workflows.

A useful side effect

Because migration forces every legacy workflow to be analysed and documented, the program delivers Polkomtel's first complete, human-readable inventory of its data-integration dependencies — a knowledge asset that did not previously exist.


ROI and TCO — two cost models

The two source documents present materially different cost models. Both are reported in full. Do not combine them.

Model A — Business documentation ($0 software licensing)

This model treats DataFlow AI as open-source-based with no software license fee.

YearCurrent spend (Informatica + Alteryx)DataFlow AI total costAnnual saving
Year 1~$2.5M–$3.5M~$315K~$2.2M+
Year 2~$1.6M–$2.2M~$165K~$1.4M+
Year 3~$1.6M–$2.2M~$165K~$1.4M+
3-Year total~$6.7M–$9.4M~$645K~$6M–$8.7M saved

Cost line items under this model:

Cost itemCurrentDataFlow AIAnnual saving
Software licensing$1.2M–$2M (Informatica) + $200K (Alteryx)$0 (open-source based)$1.4M–$2.2M/yr
Cloud infrastructureOn-prem hardware ~$150K–$300K/yr$165K/yr (realistic GCP)Comparable — no capital expenditure
Professional services / support$300K–$500K/yrIncluded in platform delivery$300K–$500K/yr
Migration (one-time)N/A~$150K (8 months × 4 engineers)One-time
Training & change management~$80K/yr~$30K (Year 1 only)$50K–$80K ongoing

Headline figures from this model: Year 1 net saving $2.2M+; break-even in Month 3–4 after go-live; 3-Year ROI over 1,000% ($6–8.7M saved on $645K invested).

Conservative 3-year ROI broken down by benefit category:

Benefit category3-Year totalConfidence
Informatica licensing eliminated$3,600K–$6,000KHigh
Alteryx licensing eliminated$450K–$750KHigh
Informatica professional services / support eliminated$900K–$1,500KHigh
On-premises hardware reclaimed$210K–$420KMedium
Engineering productivity (60–80% faster pipeline development)$600K–$1,200KMedium
GDPR compliance risk reduction$300K–$1,500KEstimated
DSAR process automation (60+ DSARs/month)$150K–$300KHigh
Less: DataFlow AI total cost($645K)
Net benefit (conservative)$5,615K+

Engineering productivity is valued at PLN 200K per engineer per year (the Polish senior data-engineer market rate). The GDPR figure uses a minimum UODO fine estimate.

Model B — RFI answer (platform fee + consumption credits)

This model prices DataFlow AI as a platform license + consumption credits + support — not $0 licensing.

ComponentYear 1Year 2Year 33-Year total
Platform license (all connectors, SSO, governance)$110K$110K$110K$330K
Consumption credits (500 pipelines, 15–25 TB daily)$160K$180K$200K$540K
Premium support + dedicated Technical Account Manager$40K$40K$40K$120K
Annual subtotal$310K$330K$350K$990K

On top of the recurring cost, the RFI quotes a one-time Year 1 migration cost of $430K–$780K: PowerCenter migration $200K–$350K, Alteryx $50K–$100K, manual remediation (~20%) $100K–$200K, parallel run and validation $50K–$100K, and training for 20 users $30K.

Putting it together, the RFI total cost summary is: Year 1 (license + migration) $740K–$1,090K, Year 2 $330K, Year 3 $350K — a 3-Year TCO of $1,420K–$1,770K.

The RFI also compares DataFlow AI against the alternatives:

OptionRFI 3-Year TCO
DataFlow AI$1.4M–$1.8M
Informatica IDMC$2.4M–$4.5M
Qlik Talend (including CDC)$1.5M–$2.5M
Keep current tools (Informatica PC + Alteryx)$1.5M–$2.0M, with rising end-of-life risk

The two models do not agree

Model A claims $0 software licensing and a 3-Year TCO of ~$645K. Model B charges a $110K/yr platform license plus consumption credits and lands at a 3-Year TCO of $1.42M–$1.77M. These are not reconcilable. When quoting a TCO figure, always state which model it comes from.

Non-financial benefits

Both models agree on the qualitative benefits:

  • Faster time-to-insight — moving from batch to real-time enables same-day reporting and live dashboards.
  • Engineering capacity freed up — 60–80% faster pipeline development means 3–5× more data products per year from the same team.
  • Lower regulatory risk — UODO (the Polish data-protection authority) can fine up to PLN 100K per violation, and systemic violations up to 4% of annual global turnover.
  • Strategic asset ownership — Polkomtel owns the platform code and can extend it.
  • AI competitive advantage — native Anthropic Claude integration.

KPIs and success criteria

Success is defined as a measurable target state 12 months after go-live.

Success metricCurrent state12-month target
Informatica workflows migrated and in production0500+ (100%)
Informatica PowerCenter statusFully operationalDecommissioned (licensing cancelled)
Alteryx workflows migrated050–100 (100%)
Annual licensing cost reduction$0$1.4M–$2.2M/year saved
Real-time CDC streams operational05+ (Oracle, PostgreSQL, MSSQL, MySQL, MongoDB)
Average new pipeline development time3–10 days< 1 day (AI-assisted)
DSAR response time3–5 business days (manual)< 5 minutes (automated)
Active data-quality rules0200+ rules across all critical datasets
Column-level lineage coverage0%> 90% of pipeline-managed datasets
GDPR-compliant PII classificationManual, incompleteAutomated — all new data classified on ingestion
Production platform uptimeN/A> 99.95% (Professional SLA target)
Active users (all roles)030–50+

Maturity comparison

A side-by-side of where Polkomtel is today versus where DataFlow AI takes it:

DimensionCurrent (Informatica)Target (DataFlow AI)
Processing latencyBatch only (daily/hourly)Real-time (milliseconds, via CDC + Flink)
Development speed3–10 days per pipeline2–8 hours per pipeline (AI-assisted)
Data governanceManual, spreadsheet-basedAutomated, policy-driven, RODO compliant
Lineage visibilityNone (black-box mappings)Column-level, full history
Self-service accessEngineering-mediated onlyAI Copilot for any authorized user
Infrastructure modelFixed on-premises serversAuto-scaling serverless GCP
Cost structureHigh fixed licensingLow variable cloud consumption
Vendor independenceFully dependent on InformaticaOpen source plus owned code

Support and SLA tiers

auraliscode offers three support tiers. The "SLA" (Service Level Agreement) is the contractual promise on uptime and response time. "RPO" is the maximum acceptable data loss measured in time; "RTO" is the maximum acceptable time to restore service.

TierUptimeCoverageP1 responseChannelsDR target (RPO / RTO)
Standard99.9%9×5 (Mon–Fri, 08:00–17:00 CET)4 hoursEmail + portalRPO 4h / RTO 1h
Professional99.95%24×71 hourEmail + phone + portal; dedicated Customer Success ManagerRPO 2h / RTO 30 min
Enterprise99.99%24×7×36515 minutesAll of the above + dedicated Slack; up to 2 on-site engineer days/month; custom SLA with financial penaltiesRPO 30 min / RTO 15 min

Priority levels: P1 Critical (outage or data corruption, no workaround), P2 High (major feature down, workaround exists), P3 Medium (minor degradation), P4 Low (questions and feature requests).

Disaster recovery uses a dual-region GCP setup — europe-central2 (Warsaw) as primary and europe-west3 (Frankfurt) as a standby. Both regions are inside the EU, so GDPR data residency is preserved even after a failover. Scheduled maintenance is capped at 4 hours per calendar month, announced 48 hours in advance, and scheduled between 02:00 and 06:00 CET.

GDPR / RODO is built in, not bolted on

Every feature that touches personal data ships with built-in controls: automatic PII detection on schema discovery (PESEL, IMSI/IMEI, biometric, location, financial, contact data, and GDPR Article 9 special categories), a DSAR API that traces a person's data through the lineage graph, and tamper-proof audit logs meeting SOC 2 Type II standards. All data stays in GCP europe-central2 in Warsaw — no cross-border transfers.


Where to go next

Previous
What is ETL?