Introduction
Business value & ROI
DataFlow AI is an AI-native ETL/ELT platform built to replace Informatica PowerCenter and Alteryx Designer at Polkomtel Plus — removing legacy vendor lock-in, enabling real-time data integration, and closing GDPR/RODO compliance gaps. This page explains the business case: what hurts today, what changes, what it costs, and how success is measured. Where the source documents disagree on numbers, both figures are shown and the discrepancy is flagged.
What DataFlow AI is
DataFlow AI Platform is a cloud-native enterprise ETL/ELT platform — software that extracts data from source systems, transforms it (cleans, joins, aggregates), and loads it into a destination such as a data warehouse. "ELT" is the same idea with the transform step pushed down into the target database engine.
It is positioned as a single platform that combines:
- The workflow automation of Informatica PowerCenter (batch ETL).
- The self-service analytics of Alteryx Designer (analyst data prep).
- Real-time streaming via change data capture (CDC).
- AI-assisted development through an in-product Copilot.
- Comprehensive data governance (lineage, quality, GDPR/RODO controls).
The vendor is auraliscode, a Warsaw-based data-integration company. The target client is Polkomtel Plus, a major Polish telecom operator. Under the engagement model, platform delivery is fixed-price and source-code ownership transfers to Polkomtel on completion — there is no permanent dependency on the vendor.
A note on jargon
ETL = Extract, Transform, Load. CDC (Change Data Capture) = streaming every insert/update/delete out of a database as it happens, instead of copying the whole table on a schedule. TCO = Total Cost of Ownership, the all-in cost over a multi-year period. DSAR = Data Subject Access Request, a person's legal request to see or delete their personal data.
Source-document discrepancies
This page draws on three source documents that do not fully agree with each other. The differences are material, so they are reported openly rather than averaged away:
| Topic | Business documentation | RFI answer |
|---|---|---|
| Vendor legal entity | auraliscode sp. z o.o | auraliscode SA |
| Vendor contact domain | .pl emails | .eu emails |
| Software licensing model | $0 (open-source based) | $110K/yr platform license + consumption credits |
| 3-Year TCO | ~$645K | $1,420K–$1,770K |
| Alteryx licensing cost | "~$200K/yr" | "$150K–$250K/yr" |
Treat the cost sections below with this in mind: the two cost models are not reconcilable as written.
The legacy landscape at Polkomtel
Polkomtel Plus today runs two separate, ageing data-integration tools. Neither fits the company's stated cloud-first strategy.
| Legacy tool | Estate | What it does | Licensing cost | Key problems |
|---|---|---|---|---|
| Informatica PowerCenter | 500+ active workflows | Batch ETL for business intelligence, the data warehouse, and regulatory reporting | $1.2M–$2M/yr | On-premises only; manual version control; technical debt accumulated since 2008–2012; integrates SAP HANA, Oracle, Teradata, MSSQL |
| Alteryx Designer | 50–100 analyst workflows (.yxmd files) | Self-service analytics and marketing data prep | ~$200K/yr (also cited as $150K–$250K/yr) | Used by analysts, not engineers; workflows shared as files with no central repository; no version control or data lineage |
"On-premises" means the software runs on servers Polkomtel owns and operates in its own data center, rather than in a cloud provider's data center.
Five strategic pain points
These are the concrete problems the platform is designed to solve.
- Vendor lock-in and licensing cost. Informatica is priced per-CPU and per-connector; the bill has grown roughly 15–20% per year. There is no exit path — the licensing cycle is perpetual.
- No real-time capabilities. PowerCenter is batch-only. Capturing changes from Oracle, PostgreSQL, MSSQL, MySQL, or MongoDB as they happen (CDC) requires separate, expensive Informatica PowerExchange licenses that Polkomtel does not currently hold.
- Slow development cycles. Building a new Informatica workflow takes 3–10 days. Alteryx workflows are ungoverned, file-based, and cannot be reliably promoted to production.
- GDPR / RODO compliance gaps. There is no automated detection of personal data (PII), no column-level lineage to answer data-subject requests, no automated retention enforcement, no tamper-proof audit logs, and no built-in data masking. ("RODO" is the Polish name for the GDPR.)
- Cloud migration blockers. PowerCenter's on-premises architecture conflicts with Polkomtel's GCP cloud-first strategy. Even Informatica's own cloud product (IICS/IDMC) is a costly re-architecture that does not solve the cost problem.
Why the RFI happened
The engagement originates from a Polkomtel RFI ("Request for Information") titled "POC ETL Tool for DWH Polkomtel". Three drivers were stated:
- A. The data infrastructure is moving to the cloud and needs cloud-adapted integration tooling.
- B. The current ETL tool's technical support and development are ending, which limits further operation under Polkomtel's operational and security requirements.
- C. Greater flexibility, scalability, and automation are needed to keep up with growing data volumes and a wider variety of sources.
The objective: select a solution that enables secure, scalable, efficient data processing in both cloud and on-premise architectures with long-term operational stability.
Target users and their pain points
DataFlow AI is built for five user roles, each with a tailored dashboard and permission set. Sign-in is through Keycloak with single sign-on (SSO) into Active Directory / Azure AD. The five roles map to 26 granular permissions.
| Role | Headcount at Polkomtel | What they do | Pain point removed |
|---|---|---|---|
| Admin (Platform Administrator) | 2–3 senior platform/DevOps engineers | User provisioning, connector and credential management, environment config, health monitoring, GCP cost management, incident response | Manual incident response — the platform self-heals known failure patterns before a human is involved |
| Data Engineer (Pipeline Developer) | 5–15 engineers | Build and tune pipelines, migrate Informatica/Alteryx workflows, manage data-quality rules, configure CDC streams | Repetitive boilerplate and slow debugging — the AI Copilot diagnoses failures in plain language (e.g. explains an ORA-01555 Oracle error and its fix) |
| Data Analyst (Self-Service Analytics) | Spread across marketing, revenue assurance, network planning, finance, compliance | Consume engineer-prepared data, ask questions in natural language, schedule queries — they do not build ETL | Dependency on the central data-engineering team — natural-language-to-SQL lets analysts self-serve |
| Data Steward (Governance & Compliance) | 2–4 people across data domains | Data-quality governance, GDPR/RODO compliance, catalog accuracy, DSAR responses, business-glossary upkeep | Slow DSAR handling — responses drop from days of manual investigation to 5–15 minutes per request via the DSAR API |
| Viewer (Read-Only Stakeholders) | Senior business stakeholders, audit/compliance officers, supervised external partners | Visibility only — no create or modify rights | Lack of visibility — read-only access to dashboards, monitoring, and the catalog |
The user interface is fully bilingual (Polish/English) across seven translation namespaces, and the AI Copilot accepts and answers in both languages. The proposed default is Polish for business users and English for engineering.
Business use cases
What Polkomtel will actually do with the platform:
- Legacy ETL migration — automated conversion of 500+ Informatica workflows and 50–100 Alteryx workflows.
- Real-time analytics enabled by CDC — live subscriber churn scoring while a customer browses the website, live network anomaly detection, and fraud signals on individual transactions, instead of once-daily batch sweeps.
- Telecom CDR / ASN.1 processing — a purpose-built decoder that converts binary Call Detail Records (CDRs) emitted by network elements into Parquet files. This is cited as unique among commercial ETL platforms.
- Regulatory reporting — E112 emergency-services call routing and reporting, UKE (Polish telecom regulator) reporting, and built-in handling of the Polish national identifier (PESEL).
- Self-service analytics — analysts query data in plain language, e.g. "top 10 most profitable roaming destinations last quarter by revenue per subscriber."
- Multi-tenant domain isolation — separate workspaces per business domain: Network Operations, Revenue Assurance, Marketing Analytics, Finance Reporting, Regulatory Compliance.
- Net-new data products — after migration, the freed-up engineering capacity builds new things: real-time customer analytics, AI-powered network operations, advanced fraud detection.
- Knowledge preservation — the migration program produces Polkomtel's first complete inventory of data-integration dependencies plus human-readable business-rule documentation, stored in the Data Catalog.
Migration economics
Migration is automated where possible and AI-assisted where it is not. The 500+ Informatica workflows fall into four categories by difficulty.
| Category | Share of estate | Effort per workflow | Risk |
|---|---|---|---|
| Fully Automatic | 58% | < 30 min review | Zero migration risk |
| AI-Assisted | 27% | 2–4 hours (80–95% AI-generated) | Combined with automatic = 85%+ coverage |
| Manual Migration | 12% | 1–3 days | Engineering involvement required |
| Re-Architecture Required | 3% | 3–10 days | Custom C++ plugins, PowerExchange mainframe CDC — a property of the legacy code, not a DataFlow AI limitation |
The full migration program is estimated at roughly 2,545 person-hours across ~550–600 workflows, run by a team of 3–4 senior engineers over 6–8 months.
Validated workflow samples
Three real Polkomtel workflows were used to validate the migration engine. Note the two auto-conversion figures per row — the business documentation and the RFI answer agree on the percentage but express the remaining effort differently.
| Workflow | Tool | Complexity | Auto-conversion | Remaining effort |
|---|---|---|---|---|
wf_SAP_Replika_l_BIURO_SPRZEDAZY_PLK | Informatica PC | Simple (2.1/10) | 95% | ~30 min review (business doc) / 4–8 hrs (RFI) |
wf_E112 | Informatica PC | Very Complex (8.4/10) | 68–75% | 2–3 days (business doc) / 80–120 hrs (RFI) |
EksportujDoBazyWsparcia.yxmd | Alteryx 2022.3 | Medium-High (5.8/10) | 78–82% | ~1 day (business doc) / 24–40 hrs (RFI) |
Risk mitigation
- Parallel run — every migrated workflow runs simultaneously in both DataFlow AI and Informatica for a minimum two-week validation window. Rollback is a single configuration change.
- Automated regression testing — both versions run against identical input and outputs are compared row by row.
- Quality gates — tiered by category: parallel run of 10–30 business days, an output-parity threshold of 99.5%–100%, and sign-off escalating to the Data Protection Officer and Compliance Officer for regulatory workflows.
A useful side effect
Because migration forces every legacy workflow to be analysed and documented, the program delivers Polkomtel's first complete, human-readable inventory of its data-integration dependencies — a knowledge asset that did not previously exist.
ROI and TCO — two cost models
The two source documents present materially different cost models. Both are reported in full. Do not combine them.
Model A — Business documentation ($0 software licensing)
This model treats DataFlow AI as open-source-based with no software license fee.
| Year | Current spend (Informatica + Alteryx) | DataFlow AI total cost | Annual saving |
|---|---|---|---|
| Year 1 | ~$2.5M–$3.5M | ~$315K | ~$2.2M+ |
| Year 2 | ~$1.6M–$2.2M | ~$165K | ~$1.4M+ |
| Year 3 | ~$1.6M–$2.2M | ~$165K | ~$1.4M+ |
| 3-Year total | ~$6.7M–$9.4M | ~$645K | ~$6M–$8.7M saved |
Cost line items under this model:
| Cost item | Current | DataFlow AI | Annual saving |
|---|---|---|---|
| Software licensing | $1.2M–$2M (Informatica) + $200K (Alteryx) | $0 (open-source based) | $1.4M–$2.2M/yr |
| Cloud infrastructure | On-prem hardware ~$150K–$300K/yr | $165K/yr (realistic GCP) | Comparable — no capital expenditure |
| Professional services / support | $300K–$500K/yr | Included in platform delivery | $300K–$500K/yr |
| Migration (one-time) | N/A | ~$150K (8 months × 4 engineers) | One-time |
| Training & change management | ~$80K/yr | ~$30K (Year 1 only) | $50K–$80K ongoing |
Headline figures from this model: Year 1 net saving $2.2M+; break-even in Month 3–4 after go-live; 3-Year ROI over 1,000% ($6–8.7M saved on $645K invested).
Conservative 3-year ROI broken down by benefit category:
| Benefit category | 3-Year total | Confidence |
|---|---|---|
| Informatica licensing eliminated | $3,600K–$6,000K | High |
| Alteryx licensing eliminated | $450K–$750K | High |
| Informatica professional services / support eliminated | $900K–$1,500K | High |
| On-premises hardware reclaimed | $210K–$420K | Medium |
| Engineering productivity (60–80% faster pipeline development) | $600K–$1,200K | Medium |
| GDPR compliance risk reduction | $300K–$1,500K | Estimated |
| DSAR process automation (60+ DSARs/month) | $150K–$300K | High |
| Less: DataFlow AI total cost | ($645K) | — |
| Net benefit (conservative) | $5,615K+ | — |
Engineering productivity is valued at PLN 200K per engineer per year (the Polish senior data-engineer market rate). The GDPR figure uses a minimum UODO fine estimate.
Model B — RFI answer (platform fee + consumption credits)
This model prices DataFlow AI as a platform license + consumption credits + support — not $0 licensing.
| Component | Year 1 | Year 2 | Year 3 | 3-Year total |
|---|---|---|---|---|
| Platform license (all connectors, SSO, governance) | $110K | $110K | $110K | $330K |
| Consumption credits (500 pipelines, 15–25 TB daily) | $160K | $180K | $200K | $540K |
| Premium support + dedicated Technical Account Manager | $40K | $40K | $40K | $120K |
| Annual subtotal | $310K | $330K | $350K | $990K |
On top of the recurring cost, the RFI quotes a one-time Year 1 migration cost of $430K–$780K: PowerCenter migration $200K–$350K, Alteryx $50K–$100K, manual remediation (~20%) $100K–$200K, parallel run and validation $50K–$100K, and training for 20 users $30K.
Putting it together, the RFI total cost summary is: Year 1 (license + migration) $740K–$1,090K, Year 2 $330K, Year 3 $350K — a 3-Year TCO of $1,420K–$1,770K.
The RFI also compares DataFlow AI against the alternatives:
| Option | RFI 3-Year TCO |
|---|---|
| DataFlow AI | $1.4M–$1.8M |
| Informatica IDMC | $2.4M–$4.5M |
| Qlik Talend (including CDC) | $1.5M–$2.5M |
| Keep current tools (Informatica PC + Alteryx) | $1.5M–$2.0M, with rising end-of-life risk |
The two models do not agree
Model A claims $0 software licensing and a 3-Year TCO of ~$645K. Model B charges a $110K/yr platform license plus consumption credits and lands at a 3-Year TCO of $1.42M–$1.77M. These are not reconcilable. When quoting a TCO figure, always state which model it comes from.
Non-financial benefits
Both models agree on the qualitative benefits:
- Faster time-to-insight — moving from batch to real-time enables same-day reporting and live dashboards.
- Engineering capacity freed up — 60–80% faster pipeline development means 3–5× more data products per year from the same team.
- Lower regulatory risk — UODO (the Polish data-protection authority) can fine up to PLN 100K per violation, and systemic violations up to 4% of annual global turnover.
- Strategic asset ownership — Polkomtel owns the platform code and can extend it.
- AI competitive advantage — native Anthropic Claude integration.
KPIs and success criteria
Success is defined as a measurable target state 12 months after go-live.
| Success metric | Current state | 12-month target |
|---|---|---|
| Informatica workflows migrated and in production | 0 | 500+ (100%) |
| Informatica PowerCenter status | Fully operational | Decommissioned (licensing cancelled) |
| Alteryx workflows migrated | 0 | 50–100 (100%) |
| Annual licensing cost reduction | $0 | $1.4M–$2.2M/year saved |
| Real-time CDC streams operational | 0 | 5+ (Oracle, PostgreSQL, MSSQL, MySQL, MongoDB) |
| Average new pipeline development time | 3–10 days | < 1 day (AI-assisted) |
| DSAR response time | 3–5 business days (manual) | < 5 minutes (automated) |
| Active data-quality rules | 0 | 200+ rules across all critical datasets |
| Column-level lineage coverage | 0% | > 90% of pipeline-managed datasets |
| GDPR-compliant PII classification | Manual, incomplete | Automated — all new data classified on ingestion |
| Production platform uptime | N/A | > 99.95% (Professional SLA target) |
| Active users (all roles) | 0 | 30–50+ |
Maturity comparison
A side-by-side of where Polkomtel is today versus where DataFlow AI takes it:
| Dimension | Current (Informatica) | Target (DataFlow AI) |
|---|---|---|
| Processing latency | Batch only (daily/hourly) | Real-time (milliseconds, via CDC + Flink) |
| Development speed | 3–10 days per pipeline | 2–8 hours per pipeline (AI-assisted) |
| Data governance | Manual, spreadsheet-based | Automated, policy-driven, RODO compliant |
| Lineage visibility | None (black-box mappings) | Column-level, full history |
| Self-service access | Engineering-mediated only | AI Copilot for any authorized user |
| Infrastructure model | Fixed on-premises servers | Auto-scaling serverless GCP |
| Cost structure | High fixed licensing | Low variable cloud consumption |
| Vendor independence | Fully dependent on Informatica | Open source plus owned code |
Support and SLA tiers
auraliscode offers three support tiers. The "SLA" (Service Level Agreement) is the contractual promise on uptime and response time. "RPO" is the maximum acceptable data loss measured in time; "RTO" is the maximum acceptable time to restore service.
| Tier | Uptime | Coverage | P1 response | Channels | DR target (RPO / RTO) |
|---|---|---|---|---|---|
| Standard | 99.9% | 9×5 (Mon–Fri, 08:00–17:00 CET) | 4 hours | Email + portal | RPO 4h / RTO 1h |
| Professional | 99.95% | 24×7 | 1 hour | Email + phone + portal; dedicated Customer Success Manager | RPO 2h / RTO 30 min |
| Enterprise | 99.99% | 24×7×365 | 15 minutes | All of the above + dedicated Slack; up to 2 on-site engineer days/month; custom SLA with financial penalties | RPO 30 min / RTO 15 min |
Priority levels: P1 Critical (outage or data corruption, no workaround), P2 High (major feature down, workaround exists), P3 Medium (minor degradation), P4 Low (questions and feature requests).
Disaster recovery uses a dual-region GCP setup — europe-central2 (Warsaw) as primary and europe-west3 (Frankfurt) as a standby. Both regions are inside the EU, so GDPR data residency is preserved even after a failover. Scheduled maintenance is capped at 4 hours per calendar month, announced 48 hours in advance, and scheduled between 02:00 and 06:00 CET.
GDPR / RODO is built in, not bolted on
Every feature that touches personal data ships with built-in controls: automatic PII detection on schema discovery (PESEL, IMSI/IMEI, biometric, location, financial, contact data, and GDPR Article 9 special categories), a DSAR API that traces a person's data through the lineage graph, and tamper-proof audit logs meeting SOC 2 Type II standards. All data stays in GCP europe-central2 in Warsaw — no cross-border transfers.
Where to go next
- For deployment topologies, infrastructure sizing, and the full GCP cost breakdown, see Deployment scenarios & sizing.
- For the actual build, release, and rollout mechanics, see Deployment & rollout.
- For the role-by-role permission model, see Personas & roles and RBAC.