Extend & contribute

Developer guide

This guide is for engineers building on the DataFlow AI Platform. It covers the monorepo layout, how to set up a local development environment, the build and test commands for each component, the coding standards the project enforces, the Docker Compose dev loop, debugging, the contributing workflow, and how to add a new backend service or frontend feature.


Architecture at a glance

DataFlow AI is a microservices platform with domain-driven service boundaries. An API gateway fronts six backend services that share a single PostgreSQL database.

                    +-------------------+
                    |    API Gateway    |
                    |   (Spring Cloud)  |
                    |     port 8080     |
                    +---------+---------+
                              |
        +-------+-------+----+----+-------+-------+
        |       |       |         |       |       |
   +----+--+ +--+---+ +-+------+ +---+--+ +--+--+ +---+----+
   |Metadata| |Engine| |Lineage | |Monitor| | AI  | |Migrate |
   |Service | |      | |Service | |Service| |Copilot| |Engine |
   | :8081  | | :8082| | :8084  | | :8085 | | :8086| | :8087 |
   +---+----+ +--+---+ +---+----+ +---+---+ +--+--+ +---+---+
       |          |         |          |        |         |
       +----------+---------+----------+--------+---------+
                           |
                    +------+------+
                    | PostgreSQL  |
                    | (shared DB) |
                    +-------------+

Service responsibilities

ServiceLanguagePortResponsibility
api-gatewayKotlin8080Routing, rate limiting, auth forwarding, SSE/WS proxy
metadata-serviceKotlin8081Pipeline CRUD, connections, quality rules, catalog, audit
pipeline-engineKotlin8082Compilation, execution, scheduling, Git, orchestration
lineage-serviceKotlin8084Lineage graph, column lineage, impact analysis, search
monitor-serviceKotlin8085Dashboard metrics, alerts, SSE streaming, system health
copilot (AI)Python8086NL-to-pipeline, chat, SQL generation, RAG, suggestions
migration-enginePython8087PowerCenter XML / Alteryx YXMD to YAML conversion
cliGoCommand-line interface for developers

Communication is synchronous REST between the gateway and services, SSE for alert/metrics streaming, and WebSocket for pipeline status. Kafka for event-driven lineage ingestion is planned but not yet wired.


Monorepo layout

The repository is a polyglot monorepo. The top-level directories that matter for development are:

polcomtel/
├── backend/
│   ├── platform/            Kotlin services (Gradle multi-project build)
│   │   ├── api-gateway/
│   │   ├── metadata-service/
│   │   ├── pipeline-engine/
│   │   ├── lineage-service/
│   │   ├── monitor-service/
│   │   ├── common/          Shared model + enums
│   │   ├── connector-sdk/   Connector framework + 21 connectors
│   │   ├── pushdown-sql/    SQL dialect / push-down support
│   │   ├── integration-tests/
│   │   ├── settings.gradle.kts
│   │   └── gradlew
│   ├── ai-services/         Python services (Poetry)
│   │   ├── copilot/
│   │   └── migration-engine/
│   ├── cli/                 Go CLI (cmd/, internal/, pkg/)
│   ├── docker/
│   ├── docker-compose.yml
│   └── Makefile
├── frontend/                React + Vite + TypeScript SPA
│   └── src/                 api/, auth/, components/, pages/, hooks/, stores/
├── docs/                    Markdown reference docs + OpenAPI specs
├── docs-ts/                 This documentation site (Next.js + Markdoc)
└── scripts/

Build systems by component

ComponentLanguageBuild toolManifest
backend/platform/*KotlinGradle (multi-project)settings.gradle.kts, build.gradle.kts
backend/ai-services/*PythonPoetrypyproject.toml
backend/cliGoGo modulesgo.mod
frontendTypeScript / ReactVite + npmpackage.json
docs-tsTypeScript / Next.jsnpmpackage.json

The Kotlin services are a single Gradle multi-project build rooted at backend/platform/settings.gradle.kts; modules are addressed with paths such as :platform:metadata-service.


Local development setup

Prerequisites

ToolVersionInstall
JDK21+sdk install java 21-graal (SDKMAN)
Gradle8.xBundled via the ./gradlew wrapper
Python3.12+pyenv install 3.12
Poetry1.8+curl -sSL install.python-poetry.org | python3
Go1.22+go install golang.org/dl/go1.22@latest
Node.js20+nvm install 20
Docker24+Docker Desktop or colima
Docker Compose2.xBundled with Docker Desktop

Start local infrastructure

# From backend/ — start PostgreSQL, Keycloak, and Redis
docker-compose up -d

# Verify the stack
docker-compose ps

The Compose stack starts:

ServiceVersionPortNotes
PostgreSQL155432Shared database for all services
Keycloak249090Identity provider — admin login admin/admin
Redislatest6379Backs gateway rate limiting

Build and test commands per component

Kotlin services (Gradle)

Run a service locally with the Spring Boot bootRun task:

# Run each service on its port
./gradlew :platform:metadata-service:bootRun    # 8081
./gradlew :platform:pipeline-engine:bootRun     # 8082
./gradlew :platform:lineage-service:bootRun     # 8084
./gradlew :platform:monitor-service:bootRun     # 8085
./gradlew :platform:api-gateway:bootRun         # 8080
TaskCommand
Build everything./gradlew build
All unit tests./gradlew test
One module's tests./gradlew :platform:metadata-service:test
Tests with coverage./gradlew test jacocoTestReport
Integration tests (Docker required)./gradlew integrationTest
Kotlin lint./gradlew ktlintCheck

Python AI services (Poetry)

# Copilot service (port 8086)
cd backend/ai-services/copilot
poetry install
poetry run uvicorn copilot.main:app --host 0.0.0.0 --port 8086 --reload

# Migration engine (port 8087)
cd backend/ai-services/migration-engine
poetry install
poetry run uvicorn migration.main:app --host 0.0.0.0 --port 8087 --reload
TaskCommand
Install dependenciespoetry install
Run Copilot testspoetry run pytest tests/ -v (in copilot/)
Run Migration testspoetry run pytest tests/ -v (in migration-engine/)
Python lintpoetry run ruff check .

Go CLI

cd backend/cli
go build -o dataflow ./cmd

# Verify
./dataflow version
./dataflow login --server http://localhost:8080
TaskCommand
Buildgo build -o dataflow ./cmd
Testgo test ./...
Formatgofmt -w .

Frontend

cd frontend
npm install
npm run dev          # Vite dev server at http://localhost:5173
TaskCommand
Install dependenciesnpm install
Dev servernpm run dev
Production buildnpm run build
Tests (single run)npm run test:run
Tests (watch)npm run test
Tests with coveragenpm run test:coverage
LintESLint via the project config

OpenAPI spec validation

cd docs/api
npx vitest run __tests__/validate-openapi.test.ts

The local Docker Compose dev loop

A typical inner-loop session combines containerised infrastructure with locally-run services so you can hot-reload the component you are working on.

StepAction
1docker-compose up -d from backend/ to start PostgreSQL, Keycloak, Redis
2docker-compose ps to confirm all three are healthy
3Run the service(s) you are changing with bootRun / uvicorn --reload / npm run dev
4Run the gateway (:platform:api-gateway:bootRun) so the frontend has a single entry point
5Edit, save, let the dev server reload; re-run the relevant test task
6docker-compose down when finished (down -v to drop volumes for a clean DB)

Run only what you touch

You rarely need every service running. To work on lineage, start PostgreSQL via Compose, run :platform:lineage-service:bootRun and :platform:api-gateway:bootRun, and leave the others stopped. The gateway tolerates downstream services being absent and returns a clear error for routes that need them.


Debugging

ComponentHow to debug
Kotlin servicesbootRun exposes a JVM; attach a remote debugger, or run the service's Application class directly from the IDE
Python servicesRun uvicorn with --reload; use pdb/debugpy or attach the IDE debugger to the Uvicorn process
Go CLIBuild with go build and run under dlv (Delve), or use the IDE Go debugger on ./cmd
FrontendThe Vite dev server provides source maps; use browser DevTools and React DevTools
API contractsStart any service and open /swagger-ui.html, or load the spec from docs/api/ into Swagger Editor

OpenAPI specs

Full OpenAPI specifications live in docs/api/:

Spec fileService
openapi-metadata.yamlmetadata-service
openapi-pipeline-engine.yamlpipeline-engine
openapi-monitor.yamlmonitor-service
openapi-lineage.yamllineage-service
openapi-gateway.yamlapi-gateway (aggregated, with auth)

Coding standards

The project enforces one style per language. CI runs the corresponding linter and fails the build on a violation.

LanguageLinter / formatterRules
KotlinktlintDefault rules, 4-space indentation
Pythonruff + BlackDefault rules, 120-character line length
GogofmtStandard formatting
TypeScript / ReactESLint + PrettierProject ESLint config

Run the linters before pushing:

./gradlew ktlintCheck            # Kotlin
poetry run ruff check .          # Python (from a service directory)
gofmt -l .                       # Go — lists files needing formatting

Contributing workflow

Branch naming

feature/<ticket-id>-<short-description>    # New features
fix/<ticket-id>-<short-description>        # Bug fixes
refactor/<description>                     # Refactoring
docs/<description>                          # Documentation only

Examples: feature/DF-1234-add-kafka-connector, fix/DF-5678-scheduler-timezone-bug.

Commit message format

The project follows Conventional Commits:

<type>(<scope>): <subject>

<body>

<footer>
FieldAllowed values
typefeat, fix, refactor, test, docs, chore, ci
scopemetadata, engine, lineage, monitor, copilot, migration, gateway, cli, ui
feat(engine): add Kafka streaming connector support

Implements KafkaConnector extending NativeConnectorBase with
consumer group management, offset tracking, and exactly-once
semantics via idempotent producer.

Closes: DF-1234

Pull request process

  1. Create a feature branch from main.
  2. Implement the change together with its tests.
  3. Ensure all tests pass: ./gradlew test.
  4. Ensure lint passes: ./gradlew ktlintCheck (Kotlin), poetry run ruff check . (Python).
  5. Push the branch and open a PR on GitHub.
  6. A PR is mergeable when it has at least one approval from a code owner, all CI checks pass (build, test, lint, security scan), and there are no merge conflicts.
  7. Squash-merge to main.

Tests are not optional

A PR that adds a feature without tests will not pass review. Unit tests are the minimum; if the change touches a service boundary or the database, add an integration test that runs under ./gradlew integrationTest against the Docker stack.


Adding a new backend service

A new Kotlin service joins the Gradle multi-project build. The high-level steps:

StepAction
1Create backend/platform/<service-name>/ with a build.gradle.kts
2Add include(":platform:<service-name>") to settings.gradle.kts
3Depend on :platform:common for shared model and enums
4Pick an unused port (8081–8087 are taken) and set it in the service config
5Add a route in api-gateway so the service is reachable through port 8080
6Write an OpenAPI spec under docs/api/ and wire it into the validation test
7If the service owns tables, add database migrations to the shared PostgreSQL schema
8Add unit tests; add integration tests under :platform:integration-tests

A Python AI service follows the same shape under backend/ai-services/: a Poetry project with pyproject.toml, a FastAPI app served by Uvicorn, a tests/ directory exercised by pytest, and a gateway route.


Adding a new frontend feature

The frontend is a React + Vite + TypeScript single-page app. Source is organised by responsibility under frontend/src/:

DirectoryContents
api/Typed API client modules per backend domain
auth/Keycloak integration and auth state
components/Reusable UI components
pages/Route-level page components
hooks/Custom React hooks
stores/Client-side state
router.tsxRoute table

To add a feature:

StepAction
1Add a page component under src/pages/
2Register its route in src/router.tsx
3Add a typed API module under src/api/ that calls the backend through the gateway
4Extract shared UI into src/components/
5Add component/unit tests; run npm run test:run
6Verify in the dev server: npm run dev, open http://localhost:5173

Talk to the gateway, not services

Frontend API modules should call the API gateway on port 8080, never a backend service port directly. The gateway handles auth forwarding, rate limiting, and SSE/WebSocket proxying — bypassing it breaks those guarantees and will fail in deployed environments.


Command reference

GoalCommand
Start infrastructuredocker-compose up -d (from backend/)
Run a Kotlin service./gradlew :platform:<service>:bootRun
Build all Kotlin modules./gradlew build
All Kotlin unit tests./gradlew test
Kotlin integration tests./gradlew integrationTest
Kotlin lint./gradlew ktlintCheck
Run a Python servicepoetry run uvicorn <module>.main:app --reload --port <port>
Python testspoetry run pytest tests/ -v
Python lintpoetry run ruff check .
Build the CLIgo build -o dataflow ./cmd
Run the frontendnpm run dev
Frontend testsnpm run test:run
Validate OpenAPI specsnpx vitest run __tests__/validate-openapi.test.ts (from docs/api/)
Previous
Connector SDK