Retries
Retries handle transient failures in your workflows.
They are defined at the task level and can be configured to retry a task a certain number of times or with a delay between attempts.
What are retries
Retries let you automatically rerun failed tasks. Each retry creates a new task run attempt, based on the retry configuration defined in the flow.
Example
This task retries up to 5 times with a 15-minute interval between attempts:
- id: retry_sample
type: io.kestra.plugin.core.log.Log
message: my output for task {{task.id}}
timeout: PT10M
retry:
type: constant
maxAttempts: 5
interval: PT15M
In this example, the flow retries 4 times every 0.25 seconds. It succeeds on the 5th attempt, using {{ taskrun.attemptsCount }}
to track retries:
id: retry
namespace: company.team
description: This flow retries 4 times and succeeds on the 5th attempt
tasks:
- id: failed
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
type: io.kestra.plugin.core.runner.Process
commands:
- 'if [ "{{taskrun.attemptsCount}}" -eq 4 ]; then exit 0; else exit 1; fi'
retry:
type: constant
interval: PT0.25S
maxAttempts: 5
maxDuration: PT1M
warningOnRetry: true
errors:
- id: never_happen
type: io.kestra.plugin.core.debug.Return
format: Never happened {{task.id}}
Timeout vs. Max Retry Duration
timeout
: Maximum duration for a single task attempt (initial or retry). If exceeded, the attempt fails.retry.maxDuration
: Maximum total time allowed for the task, including all attempts and delays. Once exceeded, retries stop.
Example: With timeout: 10m
and maxDuration: 30m
:
- Each attempt can last up to 10 minutes.
- The overall retries stop after 30 minutes in total.
⚠️ Ensure retry.interval
is smaller than maxDuration
, or retries may not run.
Retry options
Name | Type | Description |
---|---|---|
type | string | Retry strategy: constant , exponential , or random . |
maxAttempts | integer | Number of retry attempts before stopping. |
maxDuration | Duration | Maximum total time for the task, across all attempts. |
warningOnRetry | Boolean | Marks execution as WARNING if retries occurred (default: false). |
Duration format
Durations use ISO 8601 format (weeks, months, years not supported). Examples:
Value | Description |
---|---|
PT0.25S | 250 ms |
PT2S | 2 seconds |
PT1M | 1 minute |
PT3.5H | 3 hours, 30 minutes |
P6DT4H | 6 days, 4 hours |
Retry types
constant
Retries at fixed intervals. Example: with interval: PT10M
, retries occur every 10 minutes.
Name | Type | Description |
---|---|---|
interval | Duration | Delay between attempts. |
exponential
Wait time increases after each retry (e.g., 30s, 1m, 2m, ...).
Name | Type | Description |
---|---|---|
interval | Duration | Base interval between attempts. |
maxInterval | Duration | Maximum interval allowed. |
delayFactor | Double | Multiplier (default: 2). Example: interval 30s → 30s, 1m, 2m, 4m... |
random
Randomized delays within bounds.
Name | Type | Description |
---|---|---|
minInterval | Duration | Minimum delay. |
maxInterval | Duration | Maximum delay. |
Configuring retries globally
You can configure retries globally for all tasks in Kestra:
kestra:
plugins:
configurations:
- type: io.kestra
values:
retry:
type: constant
maxAttempts: 3
interval: PT30S
This applies a constant retry policy with up to 3 attempts every 30 seconds.
Flow-level retries
You can retry at the flow level, restarting either the entire execution or just failed tasks. Options:
CREATE_NEW_EXECUTION
: Start a new execution.RETRY_FAILED_TASK
: Retry only the failed task.
id: flow_level_retry
namespace: company.team
retry:
maxAttempts: 3
behavior: CREATE_NEW_EXECUTION # or RETRY_FAILED_TASK
type: constant
interval: PT1S
tasks:
- id: fail_1
type: io.kestra.plugin.core.execution.Fail
allowFailure: true
- id: fail_2
type: io.kestra.plugin.core.execution.Fail
- With
CREATE_NEW_EXECUTION
, the execution attempt increases. - With
RETRY_FAILED_TASK
, only the task run attempt increases.
Flow-level retries also restart Subflows as new executions.
Retry vs. Restart vs. Replay
Automatic vs. manual
- Retry: Automatic rerun of failed tasks within the same execution.
- Restart: Manual rerun of failed tasks within the same execution.
- Replay: Manual rerun from any point, creating a new execution.
Restart vs. Replay
- Restart: Retries only failed tasks in the same execution.
- Replay: Starts a new execution from a chosen task, with a new execution ID. Outputs of previous tasks are reused from cache if needed.
Replays can start from successful or failed tasks but always create a new execution. Restarts keep the same execution ID.
After a Replay, you can still track which Execution triggered this new run thanks to the Original Execution
field:
Summary
Concept | Scope | Trigger | New execution? |
---|---|---|---|
Retry | Task level | Automatic | No |
Restart | Flow level | Manual | No |
Replay | Flow or task level | Manual | Yes |
Was this page helpful?