Backfills are replays of missed schedule intervals between a defined start and end date.

Let's take the following flow as an example:

yaml
id: scheduled_flow
namespace: company.team

tasks:
  - id: external_system_export
    type: io.kestra.plugin.scripts.shell.Commands
    taskRunner:
      type: io.kestra.plugin.core.runner.Process
    commands:
      - echo "processing data for {{ execution.startDate ?? trigger.date }}"
      - sleep $((RANDOM % 5 + 1))

triggers:
  - id: schedule
    type: io.kestra.plugin.core.trigger.Schedule
    cron: "*/30 * * * *"

This flow will run every 30 minutes. However, imagine that your source system had an outage for 5 hours. The flow will miss 10 executions. To replay these missed executions, you can use the backfill feature.

To backfill the missed executions, go to the Triggers tab on the Flow's detail page and click on the Backfill executions button.

backfill1

You can then select the start and end date for the backfill. Additionally, you can set custom labels for the backfill executions to help you identify them in the future.

You can pause and resume the backfill process at any time, and by clicking on the Details button, you can see more details about that backfill process:

backfill2

Trigger Backfill via an API call

Using cURL

You can invoke the backfill exections using the cURL call as follows:

sh
curl -XPUT http://localhost:8080/api/v1/triggers -H 'Content-Type: application/json' -d '{
  "backfill": {
    "start": "2024-03-28T06:30:00.000Z",
    "end": null,
    "inputs": null,
    "labels": [
      {
        "key": "reason",
        "value": "outage"
      }
    ]
  },
  "flowId": "scheduled_flow",
  "namespace": "dev",
  "triggerId": "schedule"
}'

In the backfill attribute, you need to provide the start time for the backfill. The end time can be optinally provided. You can provide inputs to the flow, if any. You can attach labels to the backfill executions by providing key-value pairs in the labels section. Other attributes to this PUT call are flowId, namespace and triggerId corresponding to the flow that is to backfilled.

Using Python requests

You can invoke the backfill exections using the Python requests as follows:

python
import requests
import json

url = 'http://localhost:8080/api/v1/triggers'

headers = {
    'Content-Type': 'application/json'
}

data = {
  "backfill": {
    "start": "2024-03-28T06:30:00.000Z",
    "end": None,
    "inputs": None,
    "labels": [
      {
        "key": "reason",
        "value": "outage"
      }
    ]
  },
  "flowId": "scheduled_flow",
  "namespace": "dev",
  "triggerId": "schedule"
}

response = requests.put(url, headers=headers, data=json.dumps(data))

print(response.status_code)
print(response.text)

With this code, you will be invoking the backfill for scheduled_flow flow under company.team namespace based on schedule trigger ID within the flow. The number of backfills that will be executed will depend on the schedule present in the schedule trigger, and the start and end times mentioned in the backfill. When the end time is null, as in this case, the end time would be considered as the present time.

Was this page helpful?