Kestra can run some Task like Python, Node or Singer in Docker containers. It is useful when you need to run a task in a specific environment or when you need to run a task that requires specific dependencies.

Defining a Docker runner

To run a task inside a Docker container, you must define the runner as Docker.

yaml
runner: DOCKER

Then you can define the Docker options to indicate which image to use:

yaml
dockerOptions:
  image: civisanalytics/datascience-python # Docker image with Python and Pandas already installed

You can customize the Docker runner a lot. You can find more information on task documentation that can run in Docker, for example, in the Python task.

Using a task with a Docker runner in your flow

While Python is powerful, sometimes it could be more efficient to accomplish tasks with a simple bash command. We will use the library csvkit to create a new CSV with fewer columns and then print some stats.

The library csvkit required dependencies like a specific Python version. To avoid installing the dependencies on each run, we will use a Docker image that already contains the dependencies. This way, you still have a small task definition that does what you want in only three lines.

yaml
  - id: bash
    type: io.kestra.core.tasks.scripts.Bash
    runner: DOCKER
    dockerOptions:
      image: jdkelley/csvkit:latest
    inputFiles:
      data.csv: "{{ outputs.download.uri }}"
    outputFiles:
    - data_update
    commands:
      - "csvcut -d ';' -c annee,conso data.csv > new.csv"
      - "csvstat new.csv"
      - "cat new.csv > {{ outputFiles.data_update }}"
Click here to see the full flow
yaml
id: kestra-tutorial
namespace: io.kestra.tutorial
labels:
  env: PRD
description: |
  # Kestra Tutorial
  As you notice, we can use markdown here.
tasks:
  - id: download
    type: io.kestra.plugin.fs.http.Download
    uri: "https://gist.githubusercontent.com/tchiotludo/2b7f28f4f507074e60150aedb028e074/raw/6b6348c4f912e79e3ffccaf944fd019bf51cba30/conso-elec-gaz-annuelle-par-naf-agregee-region.csv"
    retry:
      type: constant
      maxDuration: PT1H
      interval: PT10M
  - id: parallel
    type: io.kestra.core.tasks.flows.Parallel
    tasks:
      - id: analyze-data-sum
        type: io.kestra.core.tasks.scripts.Python
        runner: DOCKER
        dockerOptions:
          image: python
        inputFiles:
          data.csv: "{{outputs.download.uri}}"
          main.py: |
            import pandas as pd
            from kestra import Kestra
            data = pd.read_csv("data.csv", sep=";")
            sumOfConsumption = data['conso'].sum()
            Kestra.outputs({'sumOfConsumption': int(sumOfConsumption)})
        requirements:
          - pandas
      - id: analyze-data-mean
        type: io.kestra.core.tasks.scripts.Python
        runner: DOCKER
        dockerOptions:
          image: python
        inputFiles:
          data.csv: "{{outputs.download.uri}}"
          main.py: |
            import pandas as pd
            from kestra import Kestra
            data = pd.read_csv("data.csv", sep=";")
            meanOfConsumption = data['conso'].mean()
            Kestra.outputs({'meanOfConsumption': int(meanOfConsumption)})
        requirements:
          - pandas
      - id: bash
        type: io.kestra.core.tasks.scripts.Bash
        runner: DOCKER
        dockerOptions:
          image: jdkelley/csvkit:latest
        inputFiles:
          data.csv: "{{ outputs.download.uri }}"
        outputFiles:
        - data_update
        commands:
          - "csvcut -d ';' -c annee,conso data.csv > new.csv"
          - "csvstat new.csv"
          - "cat new.csv > {{ outputFiles.data_update }}"
errors:
  - id: error-handling
    type: io.kestra.core.tasks.log.Log
    message: "An error occurred."

Bravo 🎉 ! You successfully achieve our tutorial and learn the basics of Kestra!

As the next steps, we suggest reading the following documentation in this order:

  • Learn Kestra concepts.
  • Read the Developer Guide to understand how to build your own flow.
  • Look at Plugins to perform some real tasks.
  • Deploy your Kestra instance to real environments.