Operations
Deployment scenarios & sizing
DataFlow AI can be deployed in three different topologies, each with a different balance of cost, control, and time-to-launch. This page describes all three — On-Premises, Full GCP cloud, and the recommended Hybrid — with their infrastructure requirements, capacity sizing, cost tables, security posture, and a decision guide for choosing between them.
The three topologies at a glance
A "topology" here means where the platform runs and where the data lives. The deployment-scenarios source document defines three, and recommends Hybrid for Polkomtel Plus.
| Topology | Where it runs | 3-Year TCO (USD) | Annual cost | Time to first production pipeline | Full production |
|---|---|---|---|---|---|
| 1. On-Premises | Polkomtel's own Warsaw data center | $1,836,000 (≈2,286,000 PLN incl. CapEx) | $582K OpEx + $540K CapEx in Year 0 | 14–24 weeks | 6–9 months |
| 2. GCP Full Cloud | GCP europe-central2 (Warsaw), DR in europe-west3 (Frankfurt) | $495,000 (realistic) | $165,000 (realistic) | 3–4 weeks | 2–3 months |
| 3. Hybrid ⭐ Recommended | Source databases stay on-prem; the platform runs on GCP | $606,000–$741,000 | $197,000–$242,000 | 10–14 weeks | 4–6 months |
A few terms used throughout this page:
- CapEx (capital expenditure) — a large upfront purchase, e.g. buying servers.
- OpEx (operating expenditure) — a recurring cost, e.g. a monthly cloud bill.
- TCO (total cost of ownership) — the all-in cost over a defined period, here three years.
- CDC (change data capture) — streaming each database change as it happens.
- DR (disaster recovery) — a standby copy of the system in a second location.
Pricing basis
All cost figures use GCP list prices for Q1 2026, an exchange rate of 1 USD = 4.00 PLN, and Dell Q1 2026 Polish enterprise channel pricing. Polish VAT (23%) is not included in any cost table. GCP always bills in USD, which introduces foreign-exchange risk for a PLN-denominated budget.
Scenario 1 — On-Premises
Every DataFlow AI component runs on hardware Polkomtel owns and operates in its own Warsaw data center, on bare-metal Dell PowerEdge or VMware-virtualized servers, with Kubernetes provided by Red Hat OpenShift. No data ever leaves the corporate network — this is the maximum-data-sovereignty option.
Headline numbers: 2,160,000 PLN CapEx · $582K annual OpEx · $1.836M 3-Year TCO · up to 24 weeks to launch · 13 Kubernetes nodes (10 workers + 3 control plane) · 100% data sovereignty.
Infrastructure — hardware to buy (CapEx)
The On-Premises topology requires a one-time hardware purchase totalling 2,160,000 PLN (≈$540,000).
| Component | Specification | Count | Unit (PLN) | Total (PLN) |
|---|---|---|---|---|
| Kubernetes worker nodes | Dell PowerEdge R750 · 2× Xeon Silver 4316 · 256 GB ECC · 2× 1.92 TB NVMe · 25 GbE | 10 | 65,000 | 650,000 |
| Kubernetes control plane | Dell PowerEdge R650 · 2× Xeon Silver 4310 · 128 GB · 2× 960 GB NVMe | 3 | 35,000 | 105,000 |
| PostgreSQL HA servers | Dell PowerEdge R750xs · 2× Xeon Gold 5318Y · 512 GB · 8× 3.84 TB NVMe RAID10 | 2 | 95,000 | 190,000 |
| Redis cluster nodes | Dell PowerEdge R650 · Xeon Silver 4310 · 128 GB · 4× 960 GB NVMe | 3 | 30,000 | 90,000 |
| Kafka broker nodes | Dell PowerEdge R750 · Xeon Silver 4316 · 128 GB · 6× 3.84 TB NVMe · KRaft | 5 | 55,000 | 275,000 |
| Storage array (NAS) | NetApp AFF A400 · 200 TB raw (50 TB usable after RAID + replication) · 100 GbE | 1 | 280,000 | 280,000 |
| Load balancers | F5 BIG-IP i2800 · HA pair · WAF module · SSL offload | 2 | 60,000 | 120,000 |
| Top-of-rack switches | Arista 7050X3 · 32× 100 GbE | 4 | 45,000 | 180,000 |
| Spine switches | Arista 7280R3 · 36× 400 GbE | 2 | 85,000 | 170,000 |
| UPS (N+1) | APC Symmetra LX 40 kVA · 15-min battery at full load | 2 | 50,000 | 100,000 |
| Total CapEx | 2,160,000 PLN (≈$540,000) |
Infrastructure — annual running cost (OpEx)
Running the on-prem estate costs 2,328,000 PLN/yr (≈$582,000/yr).
| Category | Annual (PLN) | Notes |
|---|---|---|
| Colocation (Warsaw DC) | 336,000 | 6 racks, 30 kW, precision cooling, dual 10 Gbps uplinks |
| Power & cooling | 102,000 | 30 kW average; cooling at PUE 1.4 |
| Internet bandwidth | 108,000 | 10 Gbps redundant fiber, 2 ISPs, BGP failover |
| Red Hat OpenShift Enterprise | 216,000 | 13 nodes, Red Hat support, ACM + ACS security |
| Confluent Platform (Kafka) | 336,000 | 5 broker enterprise license, Schema Registry, ksqlDB |
| HashiCorp Vault Enterprise | 72,000 | 3-node HA, HSM seal, audit logging |
| Hardware maintenance (15% of CapEx/yr) | 324,000 | Dell ProSupport+ |
| Backup & DR software | 60,000 | Veeam, immutable backups |
| Security (EDR/IDS/IPS) | 42,000 | CrowdStrike, Fortinet, Nessus |
| Infrastructure engineers (2 FTE) | 300,000 | 2 dedicated senior engineers |
| Hardware amortization (5-year) | 432,000 | 2.16M PLN ÷ 5 years |
| Total annual OpEx | 2,328,000 PLN/yr | ≈$582,000/yr |
3-Year TCO and scaling
Year 0 CapEx is 2,160,000 PLN; Years 1–3 OpEx is 2,328,000 PLN each — a 3-Year total of 9,144,000 PLN (≈$2,286,000). There is no elastic scaling: the hardware is provisioned for peak load from day one. A hardware refresh is required at Year 5 (a further +2,160,000 PLN).
On-prem scaling is manual and slow. Worker nodes are added when average CPU exceeds 70% sustained for two weeks (order three R750 nodes — 8-week lead time). A sixth Kafka broker is added when a broker's partition-leader count exceeds 200 (55,000 PLN + 1 week to install). Power has headroom to 45 kW before the colocation footprint must expand.
When to choose On-Premises
- Best for: maximum data sovereignty; air-gapped or high-security environments; organizations that already have a data center and a large infrastructure staff; regulatory environments that prohibit cloud entirely (military, government).
- Not recommended for: fast time-to-launch (under 3 months); elastic or unpredictable workloads; teams that want managed services; heavy AI/ML work that needs GCP's Vertex AI or BigQuery ML.
Scenario 2 — GCP Full Cloud
The entire platform runs on Google Cloud Platform in the europe-central2 (Warsaw, Poland) region, with disaster recovery in europe-west3 (Frankfurt, Germany). GKE Autopilot manages all containerized workloads, and the data services — Cloud SQL, Memorystore, Dataproc Serverless, Cloud Composer — are fully managed by Google.
Headline numbers: $0 CapEx · $13,750/month (realistic) · $495K 3-Year TCO (realistic) · 3–4 weeks to launch · 100% managed services.
Three sub-scenarios
GCP Full Cloud has three cost profiles depending on load. The middle one — Realistic — is the recommended baseline.
| Sub-scenario | Monthly | Annual | Profile |
|---|---|---|---|
| Minimum | $8,150 | $97,800 | Dev/test or lean production, ~50 active pipelines, no streaming CDC, no DR, single region |
| Realistic ⭐ | $13,750 | $165,000 | Full production, 200 active pipelines, moderate CDC streaming, full HA, warm DR standby |
| Pessimistic | $23,200 | $278,400 | Peak load, heavy CDC, active-active DR in both regions, 40+ TB/day |
The Minimum profile is explicitly not recommended for production Polkomtel workloads — it has no high availability and no disaster recovery.
GCP service cost breakdown (Realistic profile)
The Realistic profile's $13,750/month breaks down across twelve service categories. The figures below are for a deployment of 500+ pipelines, 15–25 TB/day, and 30–50 concurrent executions.
| Category | Min | Realistic | Pessimistic |
|---|---|---|---|
| 1. Compute — GKE Autopilot (platform services, pipeline engine, Flink, connectors, cluster fee) | $1,873 | $3,406 | $6,639 |
| 2. Dataproc — Spark batch jobs + history server | $729 | $2,652 | $6,924 |
| 3. Database — Cloud SQL PG15 HA (instance, SSD, backups, read replica) | $320 | $754 | $1,741 |
| 4. Storage — Cloud Storage (Standard + Nearline + operations) | $65 | $193 | $639 |
| 5. Messaging — Confluent Kafka + Pub/Sub | $2,396 | $4,836 | $9,822 |
| 6. Caching — Memorystore Redis (primary HA + read replica) | $143 | $716 | $1,432 |
| 7. Orchestration — Cloud Composer (Airflow) | $282 | $565 | $1,130 |
| 8. Networking — VPN, egress, NAT, load balancer, DNS | $302 | $590 | $1,959 |
| 9. Security — Cloud Armor, Cloud KMS, Secret Manager | $32 | $121 | $445 |
| 10. Operations — Logging, Monitoring, Artifact Registry, Cloud Build | $29 | $89 | $300 |
11. Disaster recovery (europe-west3) | $20 | $872 | $4,270 |
| 12. Miscellaneous buffer (5–7%) | $79 | $56 | $464 |
| Total monthly | $8,150 | $13,750 | $23,200 |
| Total annual | $97,800 | $165,000 | $278,400 |
Two cost levers dominate:
- Confluent Cloud Kafka is the single largest line item ($4,752/month in the Realistic profile) and the most negotiable — an annual commitment typically yields a 30–50% discount.
- Dataproc Serverless cost is the most variable. "Pushdown SQL" — running the transformation inside the source database (Teradata, Snowflake) instead of moving data to Spark — is the primary cost lever.
3-Year TCO (GCP Full Cloud)
| Profile | Year 1 | Year 2 | Year 3 | 3-Year total |
|---|---|---|---|---|
| Minimum | $97,800 | $97,800 | $97,800 | $293,400 |
| Realistic | $165,000 | $165,000 | $165,000 | $495,000 |
| Realistic + 3-year CUD | $155,000 | $134,400 | $134,400 | $423,800 |
| Pessimistic | $278,400 | $278,400 | $278,400 | $835,200 |
A CUD (Committed Use Discount) is a price reduction Google gives in exchange for committing to a 1- or 3-year usage level. Applying a full 3-year CUD plus Dataproc Spot instances can cut the realistic annual GCP cost from $165,000 down to roughly $104,076/yr — a 3-year saving of about $182,772. The catch: CUDs require an upfront commitment, and Google bills the committed amount monthly regardless of actual usage.
When to choose GCP Full Cloud
- Pros: zero CapEx; elastic auto-scaling; fully managed data services (no database administration); Warsaw region keeps data GDPR-compliant; built-in DR in Frankfurt with under-60-second failover; access to Claude API, Vertex AI, and BigQuery ML; provisioning in 24–48 hours; automatic patching; fastest time-to-launch at 3–4 weeks.
- Cons: ongoing spend with no "paid off" point; data-egress costs when results are sent back on-prem; internet dependency; billing spikes if workloads exceed estimates; moving 100 TB+ of Teradata data into GCP is expensive and risky; GCP vendor lock-in; foreign-exchange risk because Google bills in USD.
Scenario 3 — Hybrid (recommended)
In the Hybrid topology, the source databases — Teradata, Oracle, SAP HANA, MSSQL — stay on-premises (they are already there, and moving them is expensive and risky), while the DataFlow AI platform itself runs on GCP europe-central2. The two halves are joined by a dedicated, private 10 Gbps Google Cloud Interconnect link with under-5-millisecond latency.
Headline numbers: $0 new CapEx · $12,740/month GCP spend · $242K total annual · $741K 3-Year TCO · 10–14 weeks to launch · under-5 ms Interconnect latency.
What stays on-prem and why
| Component | Why it stays on-prem | New cost |
|---|---|---|
| Teradata Data Warehouse | Existing investment, 100 TB+, migration risk; pushdown SQL runs near it | $0 (existing) |
| Oracle ERP database | Business-critical; on-prem regulatory policy; license tied to hardware | $0 (existing) |
| SAP HANA | SAP licensing tied to on-prem servers | $0 (existing) |
| Active Directory | Corporate identity; federated to Keycloak on GCP via LDAP | $0 (existing) |
| Debezium CDC agents (4 VMs) | Co-located with the source databases for low-latency change capture | $0 (existing VMware capacity) |
| PII masking agents (2 VMs) | Mask PESEL and other personal data before it crosses to GCP | $0 (existing VMware capacity) |
| On-prem Kafka buffer (3 brokers) | Absorbs CDC bursts; retains events if the Interconnect link drops | 180,000 PLN/yr (or $0 if existing servers are reused) |
| Cloud Interconnect on-prem termination | Cisco ASR edge router, Warsaw cross-connect | 7,200 PLN/month |
A key compliance feature: personal data is masked on-prem at the Debezium layer before it ever crosses to GCP. PESEL, NIP, REGON, phone numbers, and email addresses are pseudonymized with a deterministic SHA-256 hash. The AI Copilot only ever receives schema context — never raw billing or customer data.
Hybrid cost breakdown
The GCP-side platform cost in the Hybrid topology is lower than full-cloud — about 7% lower — because the CDC agents and their Kafka buffering run on-prem, reducing GKE, Kafka, and storage spend on GCP.
| Cost component | Annual |
|---|---|
| On-prem incremental (Interconnect, Kafka buffer, 0.5 FTE network engineer) | ~$89,100/yr (or ~$46K with existing Kafka) |
| GCP platform ($12,740/month) | $152,880/yr |
| Combined Hybrid annual | $241,980/yr (≈$242,000) |
| Combined — reusing existing on-prem Kafka | ≈$197,000/yr |
3-Year TCO (Hybrid)
| Variant | Year 1 | Year 2 | Year 3 | 3-Year total |
|---|---|---|---|---|
| Hybrid (new Kafka hardware) | $257,000 (incl. +$15K Interconnect setup) | $242,000 | $242,000 | $741,000 |
| Hybrid (existing on-prem Kafka) | $212,000 | $197,000 | $197,000 | $606,000 |
There is a one-time ~$15,000 physical cross-connect installation at the Warsaw Interconnect point of presence, with a 6–8 week lead time for the physical circuit — this is the critical path for a Hybrid rollout.
When to choose Hybrid
- Pros: keeps sensitive source databases on-prem; reuses existing infrastructure; GCP handles elastic compute, AI, and analytics; the 10 Gbps Interconnect is private, not over the internet; PII is masked on-prem before reaching GCP (RODO-compliant); a progressive migration path; lower GCP costs than full-cloud; a 3-year TCO of ~$606K versus $1.836M for On-Premises — a saving of $1.23M.
- Cons: the most complex to set up (two environments); a 6–8 week Interconnect lead time; needs network engineers for BGP/VPN/VLAN configuration; data residency is split (metadata in GCP, raw data on-prem); partial dependency on Interconnect uptime — though the on-prem Kafka buffer holds events locally for 48 hours.
Why Hybrid is recommended for Polkomtel
Polkomtel already owns Teradata, Oracle, and SAP HANA on-prem — a sunk cost best kept in place to avoid migration risk. Hybrid costs $197K–$242K/yr versus $165K/yr for GCP-only — a $32K–77K/yr premium that buys data sovereignty. Over three years, Hybrid ($741K) versus On-Premises ($2,286K) saves $1.55M — a return on investment above 200%, and Hybrid delivers 73% cost savings versus On-Premises. It launches in 10–14 weeks rather than 6–9 months, which matters for hitting the Informatica decommission deadline.
Capacity planning and growth
The three-year capacity plan assumes Year 1 launch, +30% growth in Year 2, and +50% in Year 3.
| Metric | Year 1 | Year 2 | Year 3 | On-Prem impact | GCP / Hybrid impact |
|---|---|---|---|---|---|
| Active pipelines | 200 | 260 | 300 | Possible K8s node scale-out in Year 3 | GKE autoscales; +$600/mo in Year 3 |
| Daily data volume processed | 500 GB | 800 GB | 1,200 GB | NetApp headroom; monitor Kafka storage | Cloud Storage is unbounded; +$150/mo Dataproc in Year 3 |
| CDC event throughput (peak) | 5,000 eps | 8,000 eps | 12,000 eps | May need a 6th Kafka broker in Year 3 | Confluent autoscales partitions |
| AI Copilot queries/day | 500 | 1,500 | 3,000 | Needs Anthropic API regardless | Claude API +$100/mo Year 2, +$250/mo Year 3 |
| Concurrent users | 50 | 80 | 120 | K8s horizontal pod autoscaler handles it | GKE autoscales |
| Lineage graph nodes | 50,000 | 150,000 | 500,000 | pgvector tuning needed at 500K | Cloud SQL auto-grows storage |
The Year 3 cost increase is roughly +$130K CapEx and +$50K/yr OpEx for On-Premises, versus +$7,000/month (~$84K/yr) for GCP or Hybrid. On GCP and Hybrid, scaling is automatic — no node provisioning, storage auto-grows, and budget guardrails alert at 80%, 100%, and 130% of the configured budget.
Time-to-launch comparison
| Phase | On-Premises | GCP Full Cloud | Hybrid |
|---|---|---|---|
| Infrastructure provisioning | 6–12 wks (hardware procurement) | 2–5 days (Terraform apply) | 6–8 wks (Cloud Interconnect physical circuit) |
| Platform deployment | 4–8 wks | 1–2 wks | 2–3 wks |
| Connectivity & security | 2–4 wks | 1 wk | 3–4 wks |
| First pipeline in production | 14–24 wks | 3–4 wks | 10–14 wks |
| Full production (all pipelines) | 6–9 months | 2–3 months | 4–6 months |
| Risk level | High (procurement, hardware failure) | Low (managed, auto-recovery) | Medium (network complexity) |
| Team requirement | 2+ FTE infra engineers dedicated | 0.5 FTE GCP admin | 0.5 FTE network + 0.25 FTE GCP admin |
Security and RODO compliance per scenario
RODO is the Polish implementation of the GDPR. The table below shows how each topology meets the key compliance requirements (✓ Full / ⚠ Partial).
| Requirement | On-Prem | GCP Full | Hybrid |
|---|---|---|---|
| RODO (Polish GDPR) | ✓ all data on-prem | ✓ GCP europe-central2 Warsaw | ✓ PII masked before GCP |
| UKE telecom regulations | ✓ | ⚠ partial — verify with legal | ✓ CDR / raw telco data stays on-prem |
| SOC 2 Type II | ⚠ manual implementation + audit | ✓ inherits GCP certification | ✓ GCP covered, on-prem manual |
| ISO 27001 | ⚠ manual audit | ✓ GCP certified | ⚠ partial |
| Encryption at rest | ✓ Vault HSM seal | ✓ Cloud KMS CMEK | ✓ both |
| Encryption in transit | ✓ internal mTLS | ✓ Google TLS 1.3 | ✓ MACsec Interconnect + mTLS |
| Network segmentation (zero-trust) | ⚠ manual VLAN + firewall | ✓ VPC Service Controls | ✓ GCP VPC + on-prem VLAN |
| Audit logging (immutable) | ⚠ ELK + Wazuh manual SIEM | ✓ Cloud Audit Logs (400-day retention) | ✓ both |
| DDoS protection | ⚠ FortiGate IPS | ✓ Cloud Armor Enterprise | ✓ Cloud Armor |
| Data Loss Prevention | ⚠ manual policies | ✓ Cloud DLP + DataFlow governance | ✓ Debezium masks + Cloud DLP |
All three topologies share the same DataFlow AI security features: Keycloak 24 for OIDC/SAML with Active Directory federation and MFA; HashiCorp Vault for dynamic 30-minute database credentials; API Gateway RBAC with 5 roles and 26 permissions; default-deny Kubernetes network policies; and 30-day rotation of all database passwords, API keys, and Kafka credentials.
For disaster recovery, the two source documents quote different targets — note the discrepancy:
- The deployment-scenarios document quotes a warm GCP DR target of RTO < 60 seconds, RPO < 30 seconds.
- The GCP cost-analysis document quotes RTO < 4 hours, RPO < 15 minutes for the same DR setup.
Decision guide — choosing a topology
The source document provides a decision flowchart. Walk through these questions in order:
Q1. Do regulations require ALL data to stay on-premises?
YES -> On-Premises (military / government air-gap)
NO -> go to Q2
Q2. Are there large existing on-prem databases (Teradata / Oracle / SAP HANA)?
YES -> go to Q3
NO -> go to Q4
Q3. Is data transfer to the cloud acceptable, given PII masking on-prem first?
YES -> HYBRID (recommended)
NO -> On-Premises
Q4. Is the budget under $200K/year?
YES -> GCP Full Cloud (Minimum or Realistic)
NO -> GCP Full Cloud (Realistic or Pessimistic)
The Polkomtel Plus path through this flowchart: Q1 = No (regulations permit cloud), Q2 = Yes (large Teradata, Oracle, and SAP HANA estates already on-prem), Q3 = Yes (transfer is acceptable with on-prem PII masking) → Hybrid.
Side-by-side summary
| Factor | On-Premises | GCP Full Cloud | Hybrid ⭐ |
|---|---|---|---|
| Upfront CapEx | 2.16M PLN (~$540K) | $0 | $0 new |
| 3-Year TCO | ~$2,286,000 | ~$495,000 (realistic) | ~$606,000–$741,000 |
| Time to first pipeline | 14–24 weeks | 3–4 weeks | 10–14 weeks |
| Elastic scaling | No — fixed hardware | Yes — fully automatic | Yes — GCP platform scales |
| Data sovereignty | Maximum | GCP Warsaw region | Source data on-prem, metadata in GCP |
| Dedicated staff | 2+ FTE infra engineers | 0.5 FTE | 0.75 FTE |
| Best fit | Air-gapped / regulatory ban on cloud | Greenfield, budget-led, fast launch | Large existing on-prem databases |
Two deployment realities
The topologies above describe the intended GKE-based GCP architecture. The platform's current live production deployment is a single Debian VPS running Docker Compose — not GKE, and not multi-region. For the build and rollout mechanics of what actually runs today, see Deployment & rollout.
Where to go next
- For the business case, ROI, and migration economics behind choosing a topology, see Business value & ROI.
- For the actual build, release, and rollout process — including the live VPS deployment — see Deployment & rollout.
- For administrative tasks once a topology is live, see the Admin guide.