Extend & contribute
Developer guide
This guide is for engineers building on the DataFlow AI Platform. It covers the monorepo layout, how to set up a local development environment, the build and test commands for each component, the coding standards the project enforces, the Docker Compose dev loop, debugging, the contributing workflow, and how to add a new backend service or frontend feature.
Architecture at a glance
DataFlow AI is a microservices platform with domain-driven service boundaries. An API gateway fronts six backend services that share a single PostgreSQL database.
+-------------------+
| API Gateway |
| (Spring Cloud) |
| port 8080 |
+---------+---------+
|
+-------+-------+----+----+-------+-------+
| | | | | |
+----+--+ +--+---+ +-+------+ +---+--+ +--+--+ +---+----+
|Metadata| |Engine| |Lineage | |Monitor| | AI | |Migrate |
|Service | | | |Service | |Service| |Copilot| |Engine |
| :8081 | | :8082| | :8084 | | :8085 | | :8086| | :8087 |
+---+----+ +--+---+ +---+----+ +---+---+ +--+--+ +---+---+
| | | | | |
+----------+---------+----------+--------+---------+
|
+------+------+
| PostgreSQL |
| (shared DB) |
+-------------+
Service responsibilities
| Service | Language | Port | Responsibility |
|---|---|---|---|
| api-gateway | Kotlin | 8080 | Routing, rate limiting, auth forwarding, SSE/WS proxy |
| metadata-service | Kotlin | 8081 | Pipeline CRUD, connections, quality rules, catalog, audit |
| pipeline-engine | Kotlin | 8082 | Compilation, execution, scheduling, Git, orchestration |
| lineage-service | Kotlin | 8084 | Lineage graph, column lineage, impact analysis, search |
| monitor-service | Kotlin | 8085 | Dashboard metrics, alerts, SSE streaming, system health |
| copilot (AI) | Python | 8086 | NL-to-pipeline, chat, SQL generation, RAG, suggestions |
| migration-engine | Python | 8087 | PowerCenter XML / Alteryx YXMD to YAML conversion |
| cli | Go | — | Command-line interface for developers |
Communication is synchronous REST between the gateway and services, SSE for alert/metrics streaming, and WebSocket for pipeline status. Kafka for event-driven lineage ingestion is planned but not yet wired.
Monorepo layout
The repository is a polyglot monorepo. The top-level directories that matter for development are:
polcomtel/
├── backend/
│ ├── platform/ Kotlin services (Gradle multi-project build)
│ │ ├── api-gateway/
│ │ ├── metadata-service/
│ │ ├── pipeline-engine/
│ │ ├── lineage-service/
│ │ ├── monitor-service/
│ │ ├── common/ Shared model + enums
│ │ ├── connector-sdk/ Connector framework + 21 connectors
│ │ ├── pushdown-sql/ SQL dialect / push-down support
│ │ ├── integration-tests/
│ │ ├── settings.gradle.kts
│ │ └── gradlew
│ ├── ai-services/ Python services (Poetry)
│ │ ├── copilot/
│ │ └── migration-engine/
│ ├── cli/ Go CLI (cmd/, internal/, pkg/)
│ ├── docker/
│ ├── docker-compose.yml
│ └── Makefile
├── frontend/ React + Vite + TypeScript SPA
│ └── src/ api/, auth/, components/, pages/, hooks/, stores/
├── docs/ Markdown reference docs + OpenAPI specs
├── docs-ts/ This documentation site (Next.js + Markdoc)
└── scripts/
Build systems by component
| Component | Language | Build tool | Manifest |
|---|---|---|---|
backend/platform/* | Kotlin | Gradle (multi-project) | settings.gradle.kts, build.gradle.kts |
backend/ai-services/* | Python | Poetry | pyproject.toml |
backend/cli | Go | Go modules | go.mod |
frontend | TypeScript / React | Vite + npm | package.json |
docs-ts | TypeScript / Next.js | npm | package.json |
The Kotlin services are a single Gradle multi-project build rooted at backend/platform/settings.gradle.kts; modules are addressed with paths such as :platform:metadata-service.
Local development setup
Prerequisites
| Tool | Version | Install |
|---|---|---|
| JDK | 21+ | sdk install java 21-graal (SDKMAN) |
| Gradle | 8.x | Bundled via the ./gradlew wrapper |
| Python | 3.12+ | pyenv install 3.12 |
| Poetry | 1.8+ | curl -sSL install.python-poetry.org | python3 |
| Go | 1.22+ | go install golang.org/dl/go1.22@latest |
| Node.js | 20+ | nvm install 20 |
| Docker | 24+ | Docker Desktop or colima |
| Docker Compose | 2.x | Bundled with Docker Desktop |
Start local infrastructure
# From backend/ — start PostgreSQL, Keycloak, and Redis
docker-compose up -d
# Verify the stack
docker-compose ps
The Compose stack starts:
| Service | Version | Port | Notes |
|---|---|---|---|
| PostgreSQL | 15 | 5432 | Shared database for all services |
| Keycloak | 24 | 9090 | Identity provider — admin login admin/admin |
| Redis | latest | 6379 | Backs gateway rate limiting |
Build and test commands per component
Kotlin services (Gradle)
Run a service locally with the Spring Boot bootRun task:
# Run each service on its port
./gradlew :platform:metadata-service:bootRun # 8081
./gradlew :platform:pipeline-engine:bootRun # 8082
./gradlew :platform:lineage-service:bootRun # 8084
./gradlew :platform:monitor-service:bootRun # 8085
./gradlew :platform:api-gateway:bootRun # 8080
| Task | Command |
|---|---|
| Build everything | ./gradlew build |
| All unit tests | ./gradlew test |
| One module's tests | ./gradlew :platform:metadata-service:test |
| Tests with coverage | ./gradlew test jacocoTestReport |
| Integration tests (Docker required) | ./gradlew integrationTest |
| Kotlin lint | ./gradlew ktlintCheck |
Python AI services (Poetry)
# Copilot service (port 8086)
cd backend/ai-services/copilot
poetry install
poetry run uvicorn copilot.main:app --host 0.0.0.0 --port 8086 --reload
# Migration engine (port 8087)
cd backend/ai-services/migration-engine
poetry install
poetry run uvicorn migration.main:app --host 0.0.0.0 --port 8087 --reload
| Task | Command |
|---|---|
| Install dependencies | poetry install |
| Run Copilot tests | poetry run pytest tests/ -v (in copilot/) |
| Run Migration tests | poetry run pytest tests/ -v (in migration-engine/) |
| Python lint | poetry run ruff check . |
Go CLI
cd backend/cli
go build -o dataflow ./cmd
# Verify
./dataflow version
./dataflow login --server http://localhost:8080
| Task | Command |
|---|---|
| Build | go build -o dataflow ./cmd |
| Test | go test ./... |
| Format | gofmt -w . |
Frontend
cd frontend
npm install
npm run dev # Vite dev server at http://localhost:5173
| Task | Command |
|---|---|
| Install dependencies | npm install |
| Dev server | npm run dev |
| Production build | npm run build |
| Tests (single run) | npm run test:run |
| Tests (watch) | npm run test |
| Tests with coverage | npm run test:coverage |
| Lint | ESLint via the project config |
OpenAPI spec validation
cd docs/api
npx vitest run __tests__/validate-openapi.test.ts
The local Docker Compose dev loop
A typical inner-loop session combines containerised infrastructure with locally-run services so you can hot-reload the component you are working on.
| Step | Action |
|---|---|
| 1 | docker-compose up -d from backend/ to start PostgreSQL, Keycloak, Redis |
| 2 | docker-compose ps to confirm all three are healthy |
| 3 | Run the service(s) you are changing with bootRun / uvicorn --reload / npm run dev |
| 4 | Run the gateway (:platform:api-gateway:bootRun) so the frontend has a single entry point |
| 5 | Edit, save, let the dev server reload; re-run the relevant test task |
| 6 | docker-compose down when finished (down -v to drop volumes for a clean DB) |
Run only what you touch
You rarely need every service running. To work on lineage, start PostgreSQL via Compose, run :platform:lineage-service:bootRun and :platform:api-gateway:bootRun, and leave the others stopped. The gateway tolerates downstream services being absent and returns a clear error for routes that need them.
Debugging
| Component | How to debug |
|---|---|
| Kotlin services | bootRun exposes a JVM; attach a remote debugger, or run the service's Application class directly from the IDE |
| Python services | Run uvicorn with --reload; use pdb/debugpy or attach the IDE debugger to the Uvicorn process |
| Go CLI | Build with go build and run under dlv (Delve), or use the IDE Go debugger on ./cmd |
| Frontend | The Vite dev server provides source maps; use browser DevTools and React DevTools |
| API contracts | Start any service and open /swagger-ui.html, or load the spec from docs/api/ into Swagger Editor |
OpenAPI specs
Full OpenAPI specifications live in docs/api/:
| Spec file | Service |
|---|---|
openapi-metadata.yaml | metadata-service |
openapi-pipeline-engine.yaml | pipeline-engine |
openapi-monitor.yaml | monitor-service |
openapi-lineage.yaml | lineage-service |
openapi-gateway.yaml | api-gateway (aggregated, with auth) |
Coding standards
The project enforces one style per language. CI runs the corresponding linter and fails the build on a violation.
| Language | Linter / formatter | Rules |
|---|---|---|
| Kotlin | ktlint | Default rules, 4-space indentation |
| Python | ruff + Black | Default rules, 120-character line length |
| Go | gofmt | Standard formatting |
| TypeScript / React | ESLint + Prettier | Project ESLint config |
Run the linters before pushing:
./gradlew ktlintCheck # Kotlin
poetry run ruff check . # Python (from a service directory)
gofmt -l . # Go — lists files needing formatting
Contributing workflow
Branch naming
feature/<ticket-id>-<short-description> # New features
fix/<ticket-id>-<short-description> # Bug fixes
refactor/<description> # Refactoring
docs/<description> # Documentation only
Examples: feature/DF-1234-add-kafka-connector, fix/DF-5678-scheduler-timezone-bug.
Commit message format
The project follows Conventional Commits:
<type>(<scope>): <subject>
<body>
<footer>
| Field | Allowed values |
|---|---|
type | feat, fix, refactor, test, docs, chore, ci |
scope | metadata, engine, lineage, monitor, copilot, migration, gateway, cli, ui |
feat(engine): add Kafka streaming connector support
Implements KafkaConnector extending NativeConnectorBase with
consumer group management, offset tracking, and exactly-once
semantics via idempotent producer.
Closes: DF-1234
Pull request process
- Create a feature branch from
main. - Implement the change together with its tests.
- Ensure all tests pass:
./gradlew test. - Ensure lint passes:
./gradlew ktlintCheck(Kotlin),poetry run ruff check .(Python). - Push the branch and open a PR on GitHub.
- A PR is mergeable when it has at least one approval from a code owner, all CI checks pass (build, test, lint, security scan), and there are no merge conflicts.
- Squash-merge to
main.
Tests are not optional
A PR that adds a feature without tests will not pass review. Unit tests are the minimum; if the change touches a service boundary or the database, add an integration test that runs under ./gradlew integrationTest against the Docker stack.
Adding a new backend service
A new Kotlin service joins the Gradle multi-project build. The high-level steps:
| Step | Action |
|---|---|
| 1 | Create backend/platform/<service-name>/ with a build.gradle.kts |
| 2 | Add include(":platform:<service-name>") to settings.gradle.kts |
| 3 | Depend on :platform:common for shared model and enums |
| 4 | Pick an unused port (8081–8087 are taken) and set it in the service config |
| 5 | Add a route in api-gateway so the service is reachable through port 8080 |
| 6 | Write an OpenAPI spec under docs/api/ and wire it into the validation test |
| 7 | If the service owns tables, add database migrations to the shared PostgreSQL schema |
| 8 | Add unit tests; add integration tests under :platform:integration-tests |
A Python AI service follows the same shape under backend/ai-services/: a Poetry project with pyproject.toml, a FastAPI app served by Uvicorn, a tests/ directory exercised by pytest, and a gateway route.
Adding a new frontend feature
The frontend is a React + Vite + TypeScript single-page app. Source is organised by responsibility under frontend/src/:
| Directory | Contents |
|---|---|
api/ | Typed API client modules per backend domain |
auth/ | Keycloak integration and auth state |
components/ | Reusable UI components |
pages/ | Route-level page components |
hooks/ | Custom React hooks |
stores/ | Client-side state |
router.tsx | Route table |
To add a feature:
| Step | Action |
|---|---|
| 1 | Add a page component under src/pages/ |
| 2 | Register its route in src/router.tsx |
| 3 | Add a typed API module under src/api/ that calls the backend through the gateway |
| 4 | Extract shared UI into src/components/ |
| 5 | Add component/unit tests; run npm run test:run |
| 6 | Verify in the dev server: npm run dev, open http://localhost:5173 |
Talk to the gateway, not services
Frontend API modules should call the API gateway on port 8080, never a backend service port directly. The gateway handles auth forwarding, rate limiting, and SSE/WebSocket proxying — bypassing it breaks those guarantees and will fail in deployed environments.
Command reference
| Goal | Command |
|---|---|
| Start infrastructure | docker-compose up -d (from backend/) |
| Run a Kotlin service | ./gradlew :platform:<service>:bootRun |
| Build all Kotlin modules | ./gradlew build |
| All Kotlin unit tests | ./gradlew test |
| Kotlin integration tests | ./gradlew integrationTest |
| Kotlin lint | ./gradlew ktlintCheck |
| Run a Python service | poetry run uvicorn <module>.main:app --reload --port <port> |
| Python tests | poetry run pytest tests/ -v |
| Python lint | poetry run ruff check . |
| Build the CLI | go build -o dataflow ./cmd |
| Run the frontend | npm run dev |
| Frontend tests | npm run test:run |
| Validate OpenAPI specs | npx vitest run __tests__/validate-openapi.test.ts (from docs/api/) |