Build a Custom Docker Image for Script Tasks icon Build a Custom Docker Image for Script Tasks

Build a custom Docker image for your script tasks.

Build custom Docker images for scripts

Use Kestra base image

You can bake all dependencies needed for your script tasks directly into the Kestra’s base image. Here is an example installing Python dependencies:

FROM kestra/kestra:latest
USER root
RUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3

Then, point to that Dockerfile in your docker-compose.yml file:

services:
kestra:
build:
context: .
dockerfile: Dockerfile
image: kestra-python:latest

Once you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:

id: python_process
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
script: |
import pandas as pd
import requests
import boto3
print(f"Pandas version: {pd.__version__}")
print(f"Requests version: {requests.__version__}")
print(f"Boto3 version: {boto3.__version__}")

Building a custom Docker image for your script tasks

Imagine you use the following flow:

id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"

The Python task requires pandas to be installed. Pandas is a large library, and it’s not included in the default python image. In this case, you have the following options:

  1. Install pandas in the beforeCommands property of the Python task.
  2. Use one of our pre-built images that already include pandas, such as the ghcr.io/kestra-io/pydata:latest image.
  3. Build your own custom Docker image that includes pandas.

1) Installing pandas in the beforeCommands property

id: install_pandas_at_runtime
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
beforeCommands:
- pip install pyarrow pandas
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")

2) Using one of our pre-built images

id: use_prebuilt_image
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")

3) Building a custom Docker image

If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:

FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion

Then, build the image:

docker build -t kestra-custom:latest .

Finally, use that image in your flow:

id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
containerImage: kestra-custom:latest # ⚡️ Use your custom image here
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"

Note how the pullPolicy: NEVER property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.

If you want to run languages other than Python using a custom Docker image, here is an example with Go.

Was this page helpful?