Security & RBAC
Authentication & SSO
Every request to the DataFlow AI Platform is authenticated against a single Keycloak realm before it reaches any service. This page walks the full identity path — browser login, gateway JWT validation, identity-header propagation, downstream re-validation, token lifecycle, and the security hardening applied to each response.
Identity provider — Keycloak
The platform delegates all authentication to Keycloak 24, running as the OIDC / SSO identity provider. A single realm holds every client, role, group, and user.
| Property | Value |
|---|---|
| Realm | dataflow (display name "DataFlow AI") |
| Realm export | backend/docker/keycloak/realm-export.json |
| SSL requirement | sslRequired: external |
| Self-registration | registrationAllowed: false |
| Password reset | resetPasswordAllowed: true |
| Username editing | editUsernameAllowed: false |
| Login with email | loginWithEmailAllowed: true |
| Duplicate emails | duplicateEmailsAllowed: false |
Keycloak itself runs in the compose topology on host port 8180 (mapped to container 8080) and stores its data in a dedicated dataflow_keycloak database on the shared PostgreSQL instance.
Brute-force protection
The dataflow realm enables Keycloak's built-in brute-force detection. Repeated failed logins progressively slow down and then temporarily lock an account.
| Setting | Value |
|---|---|
bruteForceProtected | true |
failureFactor | 5 failures before lockout |
maxFailureWaitSeconds | 900 (15 minutes) |
minimumQuickLoginWaitSeconds | 60 |
waitIncrementSeconds | 60 |
permanentLockout | false (temporary lockout only) |
Active Directory federation
For production, Keycloak federates users from Polkomtel's corporate directory. The admin guide describes an LDAP user-federation provider pointed at ldaps://ad.polkomtel.internal:636 in READ_ONLY edit mode, with AD-group → Keycloak-role group mappers and periodic changed-user synchronisation. Users authenticate with their existing corporate credentials; their AD group membership determines their platform role.
Token lifespans live in the admin console
The admin guide specifies an access-token lifespan of 15 minutes and an SSO session idle of 30 minutes. These values are configured in the Keycloak admin console — they are not pinned in realm-export.json. Keycloak's own default access-token lifespan is 5 minutes, which the frontend code comments call out explicitly.
OAuth2 clients
The realm defines two OAuth2 clients with deliberately different trust profiles — one public client for the browser SPA, one confidential client for backend service-to-service calls.
| Client | Type | Flows enabled | Purpose |
|---|---|---|---|
dataflow-app | public SPA | Standard flow (authorization code) + direct access grants; implicit OFF; service accounts OFF | The frontend web application |
dataflow-api | confidential | Service accounts ON; standard flow OFF; direct access grants OFF; implicit OFF | Backend service-to-service (client-credentials) |
dataflow-app — the public SPA client
Because a single-page application cannot keep a secret, dataflow-app is a public client that relies on PKCE with the S256 challenge method (pkce.code.challenge.method: S256) to protect the authorization-code exchange. Its configuration:
- Redirect URIs include the localhost development ports (
5173,4173,3000,3006) and the production originhttps://etl.exai.cloud/*. - Web origins mirror the redirect URIs plus the
+wildcard token. - Default client scopes are
web-origins,acr,profile,roles, andemail. - Scope mappings grant
dataflow-appall six realm roles.
dataflow-api — the confidential service client
dataflow-api is a confidential client used for client-credentials (service-to-service) flows. Its secret ships as the placeholder REPLACE_IN_KEYCLOAK in the realm export and must be set in the Keycloak admin console. This is the same audience that downstream services require in the aud claim (see JWT validation).
Doc drift on the client name
The admin guide §3.1 names the SPA client dataflow-ui. The actual realm export and the frontend (VITE_KEYCLOAK_CLIENT_ID default) both use dataflow-app. Always trust the realm export and frontend configuration over the admin guide here.
The OIDC login flow
The frontend uses the keycloak-js adapter. On startup it constructs a singleton Keycloak instance (frontend/src/auth/keycloak.ts) from three environment variables:
| Variable | Default |
|---|---|
VITE_KEYCLOAK_URL | http://localhost:8180 |
VITE_KEYCLOAK_REALM | dataflow |
VITE_KEYCLOAK_CLIENT_ID | dataflow-app |
Initialisation (getKeycloakInitOptions) uses onLoad: 'check-sso', a silent SSO check via /silent-check-sso.html, pkceMethod: 'S256', and checkLoginIframe: false.
Sequence diagram
Browser Keycloak (:8180) API Gateway (:8085)
│ │ │
│ (1) load SPA from nginx :3006 │
│ │ │
│ (2) check-sso / login redirect (PKCE S256) │
│─────────────────────────▶│ │
│ │ │
│ (3) user authenticates (corporate AD credentials) │
│◀─────────────────────────│ │
│ │ │
│ (4) redirect back to origin "/" with authorization code│
│◀─────────────────────────│ │
│ │ │
│ (5) keycloak-js exchanges code + PKCE verifier │
│─────────────────────────▶│ │
│ (6) access token + refresh token (JWT, RS256) │
│◀─────────────────────────│ │
│ │ │
│ (7) buildProfileFromKeycloak() parses token claims │
│ (8) signalAuthReady() — first API requests unblocked │
│ │ │
│ (9) GET /api/v1/... Authorization: Bearer <JWT> │
│────────────────────────────────────────────────────────▶│
│ │ │
│ (10) gateway validates JWT, injects │
│ X-User-* headers, proxies on │
│◀────────────────────────────────────────────────────────│
What happens in the browser
AuthProvider.tsxrunskeycloakInstance.init(...)inside aPromise.raceguarded by a 15-second timeout (KEYCLOAK_INIT_TIMEOUT_MS— previously 4 s, raised to fix an SSO redirect loop).- On success,
buildProfileFromKeycloak()parses the token into aKeycloakUserProfile: subject, name, email,realm_access.roles, and theworkspace_id/workspaceclaim. - The Keycloak instance is exposed as
window.__keycloakfor diagnostics. login()defaults the post-login redirect target to origin/— not the current URL — so users do not bounce back to/login.logout()redirects to/login.
Gating the first API requests
The axios client (frontend/src/api/client.ts) obtains the access token through a registered token accessor (registerTokenAccessor). A signalAuthReady() / waitForAuthReady promise gates the first requests until Keycloak initialisation finishes, preventing an unauthenticated first wave of calls.
JWT validation at the gateway
All client traffic enters through the API Gateway (reactive Spring Cloud Gateway, WebFlux) under /api/v1/**. The gateway is configured as an OAuth2 resource server and validates the bearer JWT on every request.
The gateway's SecurityConfig.kt disables CSRF, HTTP Basic, and form login, and wires a ReactiveKeycloakJwtConverter to translate Keycloak roles into Spring authorities.
What is checked
| Check | Detail |
|---|---|
| Signature | RS256, verified against Keycloak's JWKS endpoint |
Issuer (iss) | Must equal the Keycloak realm URL |
Expiry (exp) | Expired tokens are rejected |
Audience (aud) | Must contain dataflow-api (dataflow.security.jwt.audience, FA-004) — tokens with no aud are rejected |
Audience enforcement is performed by JwtAuthenticator. The ReactiveKeycloakJwtConverter logs every accepted JWT at INFO (principal, subject, issuer, expiry, realm roles, granted authorities).
Public paths
A small set of paths bypass authentication entirely (permitAll): OPTIONS /**, /actuator/**, /actuator/health/**, /api/v1/health, and /health/**.
Identity-header propagation
After the JWT is validated, the gateway's AuthFilter — a GlobalFilter registered at order −90 — extracts the token claims and injects six X-User-* identity headers into the request forwarded downstream.
| Header | Source |
|---|---|
X-User-Id | jwt.subject |
X-User-Email | email claim (fallback <sub>@polkomtel.pl) |
X-User-Display-Name | name / preferred_username / email local-part |
X-User-Role | Result of RBACService.mapKeycloakRolesToDataFlowRole(...) |
X-Workspace-Id | workspace_id claim (fallback default) |
X-User-Groups | groups claim, comma-joined |
Public paths (/actuator, /api/v1/health, /health) bypass the filter. Non-JWT requests (for example cookie-only or unauthenticated requests) pass through without identity headers and are logged only at DEBUG.
Downstream services trust these headers
Downstream services consume the X-User-* headers. For a JWT-authenticated request the gateway overwrites all six headers, so client-supplied spoofs are replaced. However, an unauthenticated pass-through request is not stripped of inbound X-User-* headers. Safety therefore depends on the gateway being the only ingress and on downstream services always re-validating the JWT — which they do.
JWKS re-validation in servlet services
The six downstream services are Spring MVC (servlet) applications. They do not blindly trust the gateway — they independently re-validate the same JWT, a defense-in-depth design.
Each service's common/security/SecurityConfig is annotated @EnableWebSecurity and @EnableMethodSecurity(prePostEnabled = true), runs stateless sessions, and disables CSRF. As an OAuth2 resource server it decodes tokens with:
NimbusJwtDecoder.withJwkSetUri(
// spring.security.oauth2.resourceserver.jwt.jwk-set-uri
)
A KeycloakJwtConverter maps the token's roles to authorities, so @PreAuthorize("hasAnyRole(...)") evaluates real Keycloak roles, not SCOPE_* claims.
After re-validation, a SecurityContextPopulatingInterceptor reads the gateway's X-User-* headers into a thread-local SecurityContextHolder. The SecurityContext data class exposes isAdmin, isEngineerOrAbove, and isAnalystOrAbove helpers. Thread-local cleanup is guaranteed three ways (FA-005):
- The interceptor's
afterCompletioncallback. - A
SecurityContextCleanupFilterwrapping every/api/*request intry/finally. - A
SecurityContextTaskDecoratorfor@Asyncthread pools.
Token lifecycle & refresh
The access token is short-lived and refreshed silently in the background so the user is never interrupted by a re-login.
| Constant | Value | Purpose |
|---|---|---|
TOKEN_REFRESH_BUFFER_SECONDS | 30 | Refresh this many seconds before exp |
SESSION_TIMEOUT_WARNING_SECONDS | 60 | Show the timeout modal this long before expiry |
MIN_TOKEN_VALIDITY_SECONDS | 60 | Minimum validity passed to updateToken |
- Silent refresh —
AuthProvider.scheduleTokenRefreshcallskc.updateTokento refresh the token 30 s before expiry. If the refresh fails, the session is cleared andisAuthenticatedbecomesfalse. - Session timeout warning — a
SessionTimeoutModalappears 60 s before expiry (reduced from 5 minutes, which had caused the modal to appear immediately after login). The "Extend session" action callsupdateToken(-1)to force a refresh. - 401 retry — the axios response interceptor implements a 401 → silent token refresh → retry original request flow, so a token that expired mid-flight is transparently renewed.
- Logout — clears the refresh and session timers, resets all React state, and calls
keycloakInstance.logout.
Dev-mode fallback
For local development without a running Keycloak, the frontend ships a mock auth path in frontend/src/auth/devAuth.ts.
- When
VITE_AUTH_MODE=devorVITE_AUTH_ENABLED=false(only in the Vite dev server), a persona switcher offers four mock profiles — engineer, analyst, admin, steward — that are indistinguishable from real Keycloak profiles. The dev token accessor returnsnull. AuthProvideronly falls back to dev mode whenallowDevModeFallbackis explicitly set. Otherwise a Keycloak init failure leaves the user unauthenticated — there is no implicit grant.
The gateway dev escape hatch
The backend has a parallel switch: dataflow.gateway.dev-permit-reads (default false). When true:
- The gateway permits all GET requests plus a few always-safe POSTs (
copilot,ai,catalog/ask,search) without a JWT. - Downstream
SecurityConfiggrants the anonymous user a broad set of roles.
Production must keep dev-permit-reads false
dev-permit-reads=true permits unauthenticated reads and grants the anonymous user a wide role set across services. It must be false in production. The compose file defaults DATAFLOW_GATEWAY_DEV_PERMIT_READS to true for local development — a deployment that forgets to override it would expose unauthenticated reads.
Security hardening
Every response that leaves the gateway is hardened by a chain of per-route filters. The platform describes this posture as SOC 2 / OWASP-aligned.
Security headers
SecurityHeadersFilter runs at order HIGHEST_PRECEDENCE + 10 and writes headers in a beforeCommit hook (so they land before the response is committed).
| Header | Value |
|---|---|
X-Content-Type-Options | nosniff |
X-Frame-Options | DENY |
X-XSS-Protection | 0 |
Referrer-Policy | strict-origin-when-cross-origin |
Permissions-Policy | geolocation=(), camera=(), microphone=() |
Cache-Control | no-store (configurable) |
Strict-Transport-Security | max-age=31536000; includeSubDomains; preload (when includeHsts) |
Content-Security-Policy | default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self'; connect-src 'self'; frame-ancestors 'none'; base-uri 'self'; form-action 'self' (when includeCsp) |
TLS
The realm enforces sslRequired: external. The gateway emits the HSTS preload header above; the admin guide additionally requires Keycloak to be served over HTTPS on port 443 and database connections to use sslmode=verify-full.
CORS
CorsConfig registers a CorsWebFilter on /**. Allowed origins come from dataflow.cors.allowed-origins (default http://localhost:5173, http://localhost:3000, https://app.dataflow.polkomtel.pl, https://staging.dataflow.polkomtel.pl). Methods are GET, POST, PUT, PATCH, DELETE, OPTIONS, HEAD, allowCredentials is true, and maxAge defaults to 3600 s. Exposed headers include the rate-limit headers, ETag, and Content-Disposition.
Rate limiting
RedisRateLimitFilter is a Redis sliding-window (sorted-set) limiter keyed rate_limit:{clientIp}:{endpoint}. The client IP is resolved from X-Forwarded-For, then X-User-Id, then the remote address, then anonymous.
| Endpoint group | Requests per minute |
|---|---|
/api/v1/ai | 30 |
migration | 100 |
| pipelines / connections / runs / lineage / monitor / default | 200 |
Responses carry X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset; an over-limit request returns HTTP 429 with Retry-After. The limiter is fail-closed by default (dataflow.gateway.rate-limit.fail-closed=true, FR-019): if Redis is unavailable the gateway returns HTTP 503 with X-RateLimit-Status: degraded rather than silently dropping protection, and a 30-second circuit breaker delays retrying Redis. Setting fail-closed=false is a dev-only fail-open opt-out.
Audit logging
AuditInterceptor intercepts POST, PUT, PATCH, and DELETE on /api/** (skipping health and actuator). It performs a dual write:
- Structured JSON through the SLF4J logger
com.polkomtel.dataflow.audit— level chosen by status (2xx →INFO, 4xx →WARN, otherwiseERROR) — for SIEM ingestion. - Asynchronous DB persistence via an optional
AuditPersistencebean on a single daemon thread.
Each entry captures the event ID, request ID, action (CREATE / UPDATE / PARTIAL_UPDATE / DELETE), method, path, resource type and ID, status code, duration, actor (user ID / email / role / workspace from the X-User-* headers), remote address, success flag, category, and any error type and message. Audit logs are retained 365 days and exportable to SIEM; AuditLogController read access requires hasRole('ADMIN').
PII masking
PiiMaskingFilter (a Spring Cloud Gateway filter at order LOWEST_PRECEDENCE - 10) is the only filter that intercepts response bodies, and it is applied to every route. It recognises seven PII patterns:
| Pattern | Example mask |
|---|---|
EMAIL | u***@e*****.com |
POLISH_PHONE | +48***456*** |
PESEL | 123****8901 |
CREDIT_CARD | 4532****9012 |
IBAN | PL12****3456 |
IP_ADDRESS | 192.168.*.* |
POLISH_POSTAL_CODE | 00-*** |
Masking only applies to error responses
After a 2026-05-20 refactor, MaskingResponseDecorator passes successful responses (status < 400) through verbatim and unmasked — only error responses (status ≥ 400) with a maskable content type are buffered and scrubbed. The change fixed a streaming-truncation hang where DataBufferUtils.join dropped the terminating chunk. The stated rationale is that successful JSON payloads travel through versioned DTOs and carry no free-form PII. Raw PII in a 2xx body would therefore not be masked at the gateway.
Summary
| Layer | Mechanism |
|---|---|
| Identity provider | Keycloak 24, realm dataflow, AD federation, brute-force protection |
| Browser login | keycloak-js, public client dataflow-app, PKCE S256, check-sso |
| Gateway validation | OAuth2 resource server — RS256 signature, iss, exp, aud=dataflow-api |
| Identity propagation | Six X-User-* headers injected by AuthFilter |
| Downstream re-validation | NimbusJwtDecoder JWKS + thread-local SecurityContext |
| Token lifecycle | 15-minute access token, silent refresh, 401-retry |
| Hardening | Security headers, HSTS/CSP, TLS, CORS, fail-closed rate limiting, audit log, PII masking |
Authorization — what an authenticated identity is actually allowed to do — is covered separately in RBAC & permissions.