Building a Custom Docker Image​Building a ​Custom ​Docker ​Image

Build a custom Docker image for your script tasks.

You can bake all dependencies needed for your script tasks directly into the Kestra's base image. Here is an example installing Python dependencies:

dockerfile
FROM kestra/kestra:latest

USER root
RUN apt-get update -y && apt-get install pip -y

RUN pip install --no-cache-dir pandas requests boto3

Then, point to that Dockerfile in your docker-compose.yml file:

yaml
services:
  kestra:
    build:
      context: .
      dockerfile: Dockerfile
    image: kestra-python:latest

Once you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:

yaml
id: python_process
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: PROCESS
    script: |
      import pandas as pd
      import requests
      import boto3
      print(f"Pandas version: {pd.__version__}")
      print(f"Requests version: {requests.__version__}")
      print(f"Boto3 version: {boto3.__version__}")

Building a custom Docker image for your script tasks

Imagine you use the following flow:

yaml
id: zip_to_python
namespace: company.team

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd

      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"

      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"

The Python task requires pandas to be installed. Pandas is a large library, and it's not included in the default python image. In this case, you have the following options:

  1. Install pandas in the beforeCommands property of the Python task.
  2. Use one of our pre-built images that already include pandas, such as the ghcr.io/kestra-io/pydata:latest image.
  3. Build your own custom Docker image that includes pandas.

1) Installing pandas in the beforeCommands property

yaml
id: install_pandas_at_runtime
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.core.runner.Process
    beforeCommands:
      - pip install pyarrow pandas
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")

2) Using one of our pre-built images

yaml
id: use_prebuilt_image
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")

3) Building a custom Docker image

If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:

dockerfile
FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion

Then, build the image:

bash
docker build -t kestra-custom:latest .

Finally, use that image in your flow:

yaml
id: zip_to_python
namespace: company.team

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
      pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
    containerImage: kestra-custom:latest # ⚡️ Use your custom image here
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd

      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"

      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"

Note how the pullPolicy: NEVER property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.

If you want to run languages other than Python using a custom Docker image, here is an example with Go.

Was this page helpful?