API reference
API reference: Monitor & AI
This page documents the Monitor service (openapi-monitor.yaml, port 8085) and the proxied AI Copilot, Migration Engine, and CDC connector services. The Monitor service serves under /api/v1 and requires a bearerAuth JWT on every endpoint.
Dashboard
Dashboard endpoints aggregate platform metrics, connector health, and system health.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/monitor/dashboard | getDashboard | Aggregated dashboard metrics | JWT |
| GET | /api/v1/monitor/pipelines/{id}/metrics | getPipelineMetrics | Detailed pipeline metrics (percentiles, SLA) | JWT |
| POST | /api/v1/monitor/pipelines/run | recordPipelineRun | Internal — record a pipeline run result | JWT |
| GET | /api/v1/monitor/connectors/health | getConnectorsHealth | Connector health statuses + summary | JWT |
| POST | /api/v1/monitor/connectors/health | recordConnectorHealth | Internal — record a connector health check | JWT |
| GET | /api/v1/monitor/system/health | getSystemHealth | System health (CPU/mem/disk/threads/services) | JWT |
DashboardMetrics includes totalPipelinesRun, successRate, avgDurationMs, rowsProcessed, activeAlerts, pipelinesByStatus, recentRuns[], topErrors[], connectorHealth[], timeSeriesData[]. PipelineMetricsDetail adds p50/p95/p99 durations and slaComplianceRate.
// GET /api/v1/monitor/dashboard → 200 OK (DashboardResponse)
{
"totalPipelinesRun": 1284,
"successRate": 0.973,
"avgDurationMs": 184200,
"rowsProcessed": 9821044,
"activeAlerts": 3,
"pipelinesByStatus": { "SUCCESS": 1249, "FAILED": 28, "RUNNING": 7 },
"recentRuns": [],
"topErrors": []
}
// GET /api/v1/monitor/pipelines/{id}/metrics → 200 OK (PipelineMetricsDetail)
{
"pipelineId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"p50DurationMs": 162000,
"p95DurationMs": 240000,
"p99DurationMs": 318000,
"slaComplianceRate": 0.991
}
Note
recordPipelineRun and recordConnectorHealth are internal ingestion endpoints used by the platform itself. They are documented for completeness but are not intended for direct client use.
Alerts
Alert endpoints manage individual alert records, including bulk operations, history, and purge.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/monitor/alerts | getAlerts | List alerts | JWT |
| POST | /api/v1/monitor/alerts | createAlert | Create an alert | JWT |
| GET | /api/v1/monitor/alerts/{id} | getAlert | Get an alert by ID | JWT |
| DELETE | /api/v1/monitor/alerts/{id} | deleteAlert | Delete an alert | JWT |
| POST | /api/v1/monitor/alerts/{id}/ack | acknowledgeAlert | Acknowledge an alert | JWT |
| POST | /api/v1/monitor/alerts/{id}/resolve | resolveAlert | Resolve an alert | JWT |
| POST | /api/v1/monitor/alerts/bulk/ack | bulkAcknowledge | Bulk acknowledge alerts | JWT |
| POST | /api/v1/monitor/alerts/bulk/resolve | bulkResolve | Bulk resolve alerts | JWT |
| GET | /api/v1/monitor/alerts/history | getAlertHistory | Alert history | JWT |
| GET | /api/v1/monitor/alerts/summary | getAlertSummary | Alert summary counts | JWT |
| POST | /api/v1/monitor/alerts/purge | purgeResolved | Purge old resolved alerts | JWT |
getAlerts accepts status, severity, category, and limit (default 100). getAlertHistory accepts category and sinceSeconds (default 86400). purgeResolved accepts olderThanSeconds (default 604800). Alert.severity: CRITICAL | WARNING | INFO; status: ACTIVE | ACKNOWLEDGED | RESOLVED; category: PIPELINE_FAILURE | PERFORMANCE_DEGRADATION | SECURITY_VIOLATION | DATA_QUALITY | SYSTEM_HEALTH | SLA_BREACH.
// POST /api/v1/monitor/alerts → 201 Created
// Request: CreateAlertRequest
{
"title": "Pipeline billing-daily-load failed",
"severity": "CRITICAL",
"category": "PIPELINE_FAILURE",
"message": "Task 'load' failed after 3 retries"
}
// POST /api/v1/monitor/alerts/bulk/ack → 200 OK
// Request: BulkAlertRequest
{ "alertIds": ["al-0001", "al-0002", "al-0003"] }
// GET /api/v1/monitor/alerts/summary → 200 OK (AlertSummaryResponse)
{
"active": 3,
"acknowledged": 5,
"resolved": 142,
"bySeverity": { "CRITICAL": 1, "WARNING": 2, "INFO": 0 }
}
Alert definitions
Alert definitions are reusable rules that the engine evaluates to generate alert events.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/monitor/alerts/definitions | getDefinitions | List alert definitions | JWT |
| POST | /api/v1/monitor/alerts/definitions | createDefinition | Create an alert definition | JWT |
| GET | /api/v1/monitor/alerts/definitions/{id} | getDefinition | Get a definition | JWT |
| PUT | /api/v1/monitor/alerts/definitions/{id} | updateDefinition | Update a definition | JWT |
| DELETE | /api/v1/monitor/alerts/definitions/{id} | deleteDefinition | Delete a definition | JWT |
| POST | /api/v1/monitor/alerts/definitions/{id}/toggle | toggleDefinition | Enable/disable a definition | JWT |
| POST | /api/v1/monitor/alerts/definitions/{id}/silence | silenceDefinition | Temporarily silence a definition | JWT |
getDefinitions accepts type and enabled (bool). AlertDefinition.type: PIPELINE_FAILURE | SLA_BREACH | DATA_QUALITY | ROW_COUNT_ANOMALY | CONNECTION_DOWN | RESOURCE_THRESHOLD. Each definition carries an AlertCondition (metric, operator ∈ GREATER_THAN | LESS_THAN | EQUAL | …, threshold, windowMinutes, consecutiveFailures) and recipients[] (NotificationRecipient with channel ∈ EMAIL | SLACK | PAGERDUTY | WEBHOOK).
// POST /api/v1/monitor/alerts/definitions → 201 Created
// Request: AlertDefinition
{
"name": "Billing SLA breach",
"type": "SLA_BREACH",
"enabled": true,
"condition": {
"metric": "duration_ms",
"operator": "GREATER_THAN",
"threshold": 300000,
"windowMinutes": 60,
"consecutiveFailures": 1
},
"recipients": [
{ "channel": "PAGERDUTY", "target": "routing-key-billing" }
]
}
// POST /api/v1/monitor/alerts/definitions/{id}/silence → 200 OK
// Request: { "durationMinutes": 120 }
{
"id": "def-0001",
"name": "Billing SLA breach",
"enabled": true,
"silencedUntil": "2025-12-15T16:30:00Z"
}
Alert events
Alert events are individual firings generated by the engine when a definition's condition is met.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/monitor/alerts/events | getEvents | List engine-generated alert events | JWT |
| GET | /api/v1/monitor/alerts/events/{id} | getEvent | Get an alert event | JWT |
| POST | /api/v1/monitor/alerts/events/{id}/ack | acknowledgeEvent | Acknowledge an event | JWT |
| POST | /api/v1/monitor/alerts/events/{id}/resolve | resolveEvent | Resolve an event | JWT |
| POST | /api/v1/monitor/alerts/events/{id}/silence | silenceEvent | Silence an event | JWT |
| GET | /api/v1/monitor/alerts/events/history | getEventHistory | Alert event history | JWT |
getEvents accepts status, severity, type. getEventHistory accepts definitionId (uuid), from/to (date-time), limit (default 100). acknowledgeEvent and resolveEvent take a {user} body; silenceEvent takes a {durationMinutes} body. AlertEvent.status: FIRING | ACKNOWLEDGED | RESOLVED; it carries metricValue, threshold, firedAt, ack/resolve metadata, and silencedUntil.
// GET /api/v1/monitor/alerts/events/{id} → 200 OK (AlertEvent)
{
"id": "ev-0042",
"definitionId": "def-0001",
"status": "FIRING",
"metricValue": 342000,
"threshold": 300000,
"firedAt": "2025-12-15T14:30:00Z",
"silencedUntil": null
}
// POST /api/v1/monitor/alerts/events/{id}/ack → 200 OK
// Request: { "user": "u-1001" }
{
"id": "ev-0042",
"status": "ACKNOWLEDGED",
"acknowledgedBy": "u-1001"
}
SSE streams (real-time)
The Monitor service exposes Server-Sent Event streams for real-time alerts and metrics. They route through the gateway via the /api/v1/monitor/sse/** SSE-passthrough prefix.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/monitor/sse/alerts | streamAlerts | SSE stream of real-time alerts | JWT |
| GET | /api/v1/monitor/sse/metrics | streamMetrics | SSE stream of periodic system metrics | JWT |
streamAlerts accepts a severity query param (comma-separated filter) and returns text/event-stream. Its event types are connected (initial confirmation carrying a subscriberId), alert (an alert JSON payload), and heartbeat (keepalive). The connection times out after 30 minutes.
// GET /api/v1/monitor/sse/alerts?severity=CRITICAL,WARNING
event: connected
data: {"subscriberId":"sub-7c2e","at":"2025-12-15T14:30:00Z"}
event: alert
data: {"id":"al-0099","severity":"CRITICAL","title":"Connector teradata-prod down"}
event: heartbeat
data: {}
Heads up
SSE connections are closed by the server after 30 minutes. Clients must reconnect — and may resume filtering on the same severity set — to keep receiving real-time alerts.
Notification channels
Notification channels deliver alerts to external destinations.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/monitor/channels | listNotificationChannels | List notification channels | JWT |
| POST | /api/v1/monitor/channels | createNotificationChannel | Create a notification channel | JWT |
| GET | /api/v1/monitor/channels/{id} | getNotificationChannel | Get a channel by ID | JWT |
| PUT | /api/v1/monitor/channels/{id} | updateNotificationChannel | Update a channel | JWT |
| DELETE | /api/v1/monitor/channels/{id} | deleteNotificationChannel | Delete a channel | JWT |
| POST | /api/v1/monitor/channels/{id}/test | testNotificationChannel | Send a test notification | JWT |
listNotificationChannels accepts workspaceId (uuid), channelType, and enabled (bool). NotificationChannelDto.channelType: EMAIL | SLACK | PAGERDUTY | WEBHOOK | TEAMS. The config object is channel-specific — EMAIL → recipients, SLACK/TEAMS → webhookUrl, PAGERDUTY → routingKey, WEBHOOK → url.
// POST /api/v1/monitor/channels → 201 Created
// Request: NotificationChannelDto
{
"name": "Billing team Slack",
"channelType": "SLACK",
"enabled": true,
"config": { "webhookUrl": "https://hooks.slack.com/services/T000/B000/XXXX" }
}
// POST /api/v1/monitor/channels/{id}/test → 200 OK (ChannelTestResult)
{
"success": true,
"message": "Test notification delivered",
"testedAt": "2025-12-15T14:30:00Z"
}
AI Copilot
The Copilot service (port 8086) is proxied through the gateway. It is surfaced by a single gateway stub operation rather than a formal OpenAPI spec.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/ai/copilot | copilotEndpoint | AI copilot (proxied to copilot-service) | JWT |
// GET /api/v1/ai/copilot → 200 OK
{
"service": "copilot",
"status": "available"
}
Note
The Copilot, Migration, and CDC services are proxied-only — they are reachable through the gateway but are not formally specified beyond these gateway stub operations. The shapes shown here are illustrative; consult the service teams for the full request and response contracts.
Migration Engine
The Migration service (port 8087) is proxied through the gateway and exposes migration plans.
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/migration/plans | migrationPlans | Migration plans (proxied to migration-service) | JWT |
// GET /api/v1/migration/plans → 200 OK
{
"plans": [
{ "id": "mig-0001", "name": "Teradata → Snowflake DWH migration", "status": "IN_PROGRESS" }
]
}
CDC connectors
CDC connector operations are proxied to the Connector SDK (port 8088).
| Method | Path | Operation ID | Purpose | Auth |
|---|---|---|---|---|
| GET | /api/v1/cdc/connectors | listCdcConnectors | List CDC connectors | JWT |
| POST | /api/v1/cdc/connectors | deployCdcConnector | Deploy a CDC connector | JWT |
| GET | /api/v1/cdc/health | cdcHealth | CDC health summary | JWT |
deployCdcConnector accepts a generic object body and returns 201 Created.
// POST /api/v1/cdc/connectors → 201 Created
// Request: generic connector configuration object
{
"name": "oracle-orders-cdc",
"sourceType": "ORACLE",
"tables": ["SALES.ORDERS", "SALES.ORDER_ITEMS"]
}
// GET /api/v1/cdc/health → 200 OK
{
"status": "UP",
"connectors": 6,
"running": 6,
"failed": 0
}