Metrics

kontinue workers expose Prometheus metrics for monitoring execution health and performance. By default, metrics are available at :8080/metrics on each worker pod.

Available Metrics

kontinue_executions_running

Type: Gauge

Number of executions currently running on the worker.

Example:

kontinue_executions_running{function="process-order",namespace="default",worker="worker-abc123"} 3

kontinue_execution_latency

Type: Histogram

Latency of execution in seconds. Records the time from when the worker starts processing an execution to when it completes (success or failure).

Example:

kontinue_execution_latency_bucket{function="send-email",namespace="default",worker="worker-abc123",le="1"} 45
kontinue_execution_latency_bucket{function="send-email",namespace="default",worker="worker-abc123",le="10"} 98
kontinue_execution_latency_sum{function="send-email",namespace="default",worker="worker-abc123"} 234.5
kontinue_execution_latency_count{function="send-email",namespace="default",worker="worker-abc123"} 100

kontinue_executions_completed

Type: Counter

Total number of executions completed. Incremented when an execution finishes, whether successfully or with an error.

Example:

kontinue_executions_completed{function="process-payment",result="success"} 1523
kontinue_executions_completed{function="process-payment",result="error"} 12

kontinue_execution_states

Type: Gauge

Number of executions in each state. Updated every 30 seconds by listing all Execution CRs.

Example:

kontinue_execution_states{function="daily-report",state="Pending",namespace="default",worker=""} 5
kontinue_execution_states{function="daily-report",state="Executing",namespace="default",worker="worker-abc123"} 2
kontinue_execution_states{function="daily-report",state="Suspended",namespace="default",worker="worker-abc123"} 10
kontinue_execution_states{function="daily-report",state="Completed",namespace="default",worker="worker-abc123"} 150

Configuration

The metrics server bind address can be configured via the MetricsBindAddress option when calling worker.Run():

opts := &worker.Options{
    Namespace:          "default",
    Group:              "my-worker-group",
    MetricsBindAddress: ":9090", // Default is ":8080"
}
worker.Run(ctx, opts)

Set MetricsBindAddress to "0" to disable the metrics server.

Prometheus Setup

To scrape kontinue worker metrics with Prometheus, add a scrape configuration targeting the worker pods:

- job_name: 'kontinue-workers'
  kubernetes_sd_configs:
    - role: pod
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      regex: kontinue-worker
      action: keep
    - source_labels: [__meta_kubernetes_pod_container_port_number]
      regex: "8080"
      action: keep