Timeouts

kontinue supports two types of timeouts for Executions: attempt timeouts and overall timeouts. These help prevent runaway workflows and ensure resources are released in a timely manner.

  • Attempt timeout: Maximum time for a single execution attempt. Resets on retry or resume.
  • Overall timeout: Maximum wall clock time across all attempts. Not retryable.

Configuration

Configure timeouts in the Execution spec:

apiVersion: kontinue.cloud/v1alpha1
kind: Execution
metadata:
  name: my-execution
spec:
  function: my-function
  timeout:
    attempt: 5m    # Max time per attempt
    overall: 1h    # Max total time

FieldDescriptionDefault
attemptMaximum time for a single attempt. Resets on retry or resume. Retryable.No limit
overallMaximum wall clock time from first start. Not retryable.No limit

Configuring via SDK

When spawning child executions from within a function, use ExecuteOptions:

result, err := kontinue.Execute[MyResult](ktx, "child-function", &ChildArgs{}, &kontinue.ExecuteOptions{
    Timeout: &kontinue.TimeoutOptions{
        Attempt: &metav1.Duration{Duration: 5 * time.Minute},
        Overall: &metav1.Duration{Duration: 1 * time.Hour},
    },
})

When spawning an Execution externally via the client library, pass timeout options in SpawnOptions:

exec, err := client.Spawn(ctx, "my-function", &MyArgs{}, &client.SpawnOptions{
    Timeout: &kontinuev1alpha1.ExecutionTimeout{
        Attempt: &metav1.Duration{Duration: 5 * time.Minute},
        Overall: &metav1.Duration{Duration: 1 * time.Hour},
    },
})

Function Defaults

Set default timeout configuration for all executions of a function during registration:

worker.RegisterFunction(w, "deploy-cluster", DeployCluster, &function.Options{
    Description: "Deploy a cluster with timeout protection",
    Defaults: &function.ExecutionDefaults{
        Timeout: &kontinuev1alpha1.ExecutionTimeout{
            Attempt: &metav1.Duration{Duration: 15 * time.Minute},
            Overall: &metav1.Duration{Duration: 2 * time.Hour},
        },
    },
})

These defaults apply to all Executions of the function unless overridden at creation time.

Attempt Timeout

The attempt timeout limits how long a single execution attempt can run. The timer starts when the worker begins processing the Execution and resets any time the Execution is retried or resumed (e.g., after a worker crash).

When an attempt timeout is exceeded:

  1. The Execution is marked as Failed with message “attempt timeout exceeded”
  2. If retries are configured, the Execution transitions to Backoff and retries
  3. If no retries remain, the Execution stays in Failed state
spec:
  function: process-data
  timeout:
    attempt: 5m
  retry:
    retries: 3

This is useful for operations that should complete quickly per-attempt but may need multiple tries due to transient issues.

Overall Timeout

The overall timeout limits the total wall clock time from when the Execution first started (status.startedAt). This timeout is not retryable — once exceeded, the Execution fails permanently regardless of remaining retries.

spec:
  function: long-workflow
  timeout:
    overall: 24h
  retry:
    retries: 10  # Won't help if overall timeout is reached

This is useful for enforcing SLAs or preventing workflows from running indefinitely due to repeated transient failures.

Combined Timeouts

You can use both timeouts together. The shorter deadline at any point will take effect:

spec:
  function: deploy-workflow
  timeout:
    attempt: 15m   # Each attempt limited to 15 minutes
    overall: 2h    # Total workflow limited to 2 hours
  retry:
    retries: 5

Behavior:

  • If an attempt exceeds 15 minutes, it fails and can retry (if retries remain)
  • If the total time from first start exceeds 2 hours, the Execution fails permanently
  • Retries consume time from the overall timeout

Example timeline:

Start ─────┬─────────────────────────────────────────────────> Time

   Attempt 1: 15m (timeout) → Backoff → Retry
   Attempt 2: 15m (timeout) → Backoff → Retry
   Attempt 3: 15m (timeout) → Backoff → Retry
   ...
   At 2h mark: Overall timeout → Failed (no more retries possible)