Durability
kontinue provides durable execution by persisting workflow state to Kubernetes. Every child reference and intermediate result is stored as part of the Execution resource status. This means your workflows survive worker crashes, restarts, and deployments.
When a function calls kontinue.Sleep(), spawns a sub-execution, or stores state, kontinue
updates the Execution status in Kubernetes. This persisted state becomes the source of truth,
allowing any worker to resume the execution exactly where it left off.
Resumptions
An Execution may be “resumed” at any point due to:
- Worker failure or crash (or failure of the underlying Nodes)
- Worker restart or redeployment
- Manual intervention via CLI or API
When an Execution is resumed, the function is re-executed from the beginning. However,
kontinue uses the stored state to avoid repeating work. Results from sub-executions, calls to
kontinue.Sleep(), and stored values are effectively memoized.
func LongWorkflow(ktx *kontinue.ExecutionContext, args *Args) error {
// If this completed before a crash, the result is reused on resume
result, err := kontinue.Execute[OpResult](ktx, "expensive-operation", &OpArgs{}, &kontinue.ExecuteOptions{})
if err != nil {
return err
}
// If we crashed during this sleep, we resume from where we left off
if err := kontinue.Sleep(ktx, 1 * time.Hour); err != nil {
return err
}
// This only runs after the sleep completes
_, err = kontinue.Execute[FinalResult](ktx, "final-step", result, &kontinue.ExecuteOptions{})
return err
}
The status.resumptions field tracks how many times an Execution was picked up by a
worker. You can access this in your function via ktx.Resumptions().
Determinism
Executions must be deterministic to get proper durability guarantees. This is because child resources (nested Executions, Jobs, Suspensions) are named deterministically based on execution order.
For example, if your function spawns two sub-executions:
func Deploy(ktx *kontinue.ExecutionContext, args *DeployArgs) error {
// First sub-execution gets a deterministic name based on order
_, err := kontinue.Execute[TestResult](ktx, "run-tests", &TestArgs{}, &kontinue.ExecuteOptions{})
if err != nil {
return err
}
// Second sub-execution gets a different deterministic name
_, err = kontinue.Execute[DeployResult](ktx, "deploy-prod", &DeployProdArgs{}, &kontinue.ExecuteOptions{})
return err
}
If the function is resumed and the order of these calls changes (or the arguments to the calls), kontinue will not find the older checkpointed state and may re-execute the children.
Rules for determinism:
- Don’t use
time.Now()for branching logic (usekontinue.Store()instead) - Don’t use random values for control flow (store them first)
- Don’t rely on external state that may change between resumes
- Don’t change the order of kontinue API calls based on non-deterministic data
- Determinism must also be enforced if the code for a Function is updated while Executions are still running
Stored State
The kontinue.Store() function caches a value in the Execution status so that resumed attempts
do not recompute it. This is useful for:
- Non-deterministic operations: Capture values that might change between resumes (e.g. computing an ID)
- Expensive computations: Avoid repeating costly work
- External API calls: Ensure idempotency for operations with side effects
func ProcessClusters(ktx *kontinue.ExecutionContext, args *OrderArgs) error {
// Store the list of clusters so it's consistent across resumes
clusters, err := kontinue.Store(ktx, func() ([]string, error) {
return fetchAvailableClusters()
})
if err != nil {
return err
}
// Store a generated ID to ensure idempotency
resourceID, err := kontinue.Store(ktx, func() (string, error) {
return uuid.New().String(), nil
})
if err != nil {
return err
}
// Use the stored values - they won't change on resume
return deployToCluster(clusters[0], resourceID)
}
Stored state is persisted in status.state on the Execution resource and survives any
number of resumes or retries.
Named Steps
One way to avoid determinism issues is to use named steps which allow overriding
the deterministic name generation. All SDK operations which generate names take in
an option StepOptions which allows customizing the name, for example:
kontinue.Execute[Result](ktx, "foo", &FooArgs{}, &kontinue.ExecuteOptions{
StepOptions: kontinue.StepOptions{
Name: "deterministic-name-abc123",
},
})
Overriding Name can be used to ensure that named state is deterministic even if
the preceeding execution is not. Note that Name must be globally unique (within the Kubernetes namespace)
as it is used to name Kubernetes resources.