Timeouts
kontinue supports two types of timeouts for Executions: attempt timeouts and overall timeouts. These help prevent runaway workflows and ensure resources are released in a timely manner.
- Attempt timeout: Maximum time for a single execution attempt. Resets on retry or resume.
- Overall timeout: Maximum wall clock time across all attempts. Not retryable.
Configuration
Configure timeouts in the Execution spec:
apiVersion: kontinue.cloud/v1alpha1
kind: Execution
metadata:
name: my-execution
spec:
function: my-function
timeout:
attempt: 5m # Max time per attempt
overall: 1h # Max total time
| Field | Description | Default |
|---|---|---|
attempt | Maximum time for a single attempt. Resets on retry or resume. Retryable. | No limit |
overall | Maximum wall clock time from first start. Not retryable. | No limit |
Configuring via SDK
When spawning child executions from within a function, use ExecuteOptions:
result, err := kontinue.Execute[MyResult](ktx, "child-function", &ChildArgs{}, &kontinue.ExecuteOptions{
Timeout: &kontinue.TimeoutOptions{
Attempt: &metav1.Duration{Duration: 5 * time.Minute},
Overall: &metav1.Duration{Duration: 1 * time.Hour},
},
})
When spawning an Execution externally via the client library, pass timeout options in SpawnOptions:
exec, err := client.Spawn(ctx, "my-function", &MyArgs{}, &client.SpawnOptions{
Timeout: &kontinuev1alpha1.ExecutionTimeout{
Attempt: &metav1.Duration{Duration: 5 * time.Minute},
Overall: &metav1.Duration{Duration: 1 * time.Hour},
},
})
Function Defaults
Set default timeout configuration for all executions of a function during registration:
worker.RegisterFunction(w, "deploy-cluster", DeployCluster, &function.Options{
Description: "Deploy a cluster with timeout protection",
Defaults: &function.ExecutionDefaults{
Timeout: &kontinuev1alpha1.ExecutionTimeout{
Attempt: &metav1.Duration{Duration: 15 * time.Minute},
Overall: &metav1.Duration{Duration: 2 * time.Hour},
},
},
})
These defaults apply to all Executions of the function unless overridden at creation time.
Attempt Timeout
The attempt timeout limits how long a single execution attempt can run. The timer starts when the worker begins processing the Execution and resets any time the Execution is retried or resumed (e.g., after a worker crash).
When an attempt timeout is exceeded:
- The Execution is marked as Failed with message “attempt timeout exceeded”
- If retries are configured, the Execution transitions to
Backoffand retries - If no retries remain, the Execution stays in
Failedstate
spec:
function: process-data
timeout:
attempt: 5m
retry:
retries: 3
This is useful for operations that should complete quickly per-attempt but may need multiple tries due to transient issues.
Overall Timeout
The overall timeout limits the total wall clock time from when the Execution first started
(status.startedAt). This timeout is not retryable — once exceeded, the Execution fails
permanently regardless of remaining retries.
spec:
function: long-workflow
timeout:
overall: 24h
retry:
retries: 10 # Won't help if overall timeout is reached
This is useful for enforcing SLAs or preventing workflows from running indefinitely due to repeated transient failures.
Combined Timeouts
You can use both timeouts together. The shorter deadline at any point will take effect:
spec:
function: deploy-workflow
timeout:
attempt: 15m # Each attempt limited to 15 minutes
overall: 2h # Total workflow limited to 2 hours
retry:
retries: 5
Behavior:
- If an attempt exceeds 15 minutes, it fails and can retry (if retries remain)
- If the total time from first start exceeds 2 hours, the Execution fails permanently
- Retries consume time from the overall timeout
Example timeline:
Start ─────┬─────────────────────────────────────────────────> Time
│
Attempt 1: 15m (timeout) → Backoff → Retry
Attempt 2: 15m (timeout) → Backoff → Retry
Attempt 3: 15m (timeout) → Backoff → Retry
...
At 2h mark: Overall timeout → Failed (no more retries possible)