Sharding

By default, kontinue runs all Executions on a single leader replica. Sharding enables horizontal scaling by distributing Executions across multiple replicas, such as pods in a Deployment.

When to Use Sharding

Sharding is useful when your workload is CPU or memory-bound on a single replica and you need to spread the load across multiple pods.

For simple cases, vertical scaling is more efficient. Sharding adds coordination overhead (lease management, shard assignment, health checking) that is unnecessary if a single replica can handle your workload. Sharding will not improve the latency of lightweight Executions, and it will not increase the scaling limit if the workload is bottlenecked on the Kubernetes API or etcd.

Consider sharding when:

A single replica is CPU or memory-constrained running Execution logic
You have many concurrent Executions with non-trivial compute work
You need fault isolation between subsets of Executions

Enabling Sharding

Enable sharding by setting the Sharding option on the worker:

w := worker.NewWorker()

err := w.Run(ctx, &worker.Options{
    Namespace: "default",
    Group:     "my-workers",
    Sharding:  &worker.ShardingOptions{},
})

This uses the default lease configuration (10s renew interval, 40s lease duration). You can customize these values:

Sharding: &worker.ShardingOptions{
    LeaseRenewInterval: 5 * time.Second,   // How often each replica renews its lease
    LeaseDuration:      20 * time.Second,   // How long before an unrenewed lease expires
},

Shorter intervals mean faster detection of dead replicas, but increase the number of API calls to the Kubernetes API server.

How It Works

When sharding is enabled:

Each replica advertises itself by creating and periodically renewing a coordination.k8s.io/Lease object. The shard ID is the pod hostname, which gives stable identities for StatefulSets and unique identities for Deployments.
The leader assigns shards. A coordinator running on the leader watches for new Executions without a kontinue.cloud/shard label and assigns them to alive replicas using round-robin.
Each replica processes its own shard. The ExecutionReconciler filters Executions by its shard label, so each replica only reconciles the Executions assigned to it.
Dead replicas are detected automatically. The leader periodically checks for expired leases and reassigns those Executions to alive replicas.

Leader election remains enabled. The leader handles shard coordination in addition to running its own shard of Executions. Standard controllers (Suspension, ScheduledExecution) continue to run only on the leader.

Child Execution Placement

By default, child Executions inherit the parent’s shard label, keeping the parent and child on the same replica. This is required for goroutine-based parent/child coordination to work correctly.

If you have large child Executions that would benefit from being distributed across replicas, you can opt out of shard inheritance:

result, err := kontinue.Execute[MyResult](ktx, "heavy-task", args, &kontinue.ExecuteOptions{
    DisableShardInheritance: true,
})

When DisableShardInheritance is set, the child Execution won’t inherit the parent’s shard label. The shard coordinator will assign it to any available replica, which can be used to more evenly distribute load of large child Executions.

Mutex Executions

Executions with a Mutex are always assigned to the leader’s shard. This ensures mutex coordination happens on a single replica without distributed locking.

Querying by Shard

You can filter Executions by shard using the kontinue.cloud/shard label:

# List executions assigned to a specific shard
kubectl get executions -l kontinue.cloud/shard=my-worker-0

# List executions that haven't been assigned yet
kubectl get executions -l '!kontinue.cloud/shard'