Build a custom Docker image for your script tasks.
You can bake all dependencies needed for your script tasks directly into the Kestra's base image. Here is an example installing Python dependencies:
FROM kestra/kestra:latest
USER root
RUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3
Then, point to that Dockerfile in your docker-compose.yml
file:
services:
kestra:
build:
context: .
dockerfile: Dockerfile
image: kestra-python:latest
Once you start Kestra containers using docker compose up -d
, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS
runner:
id: python_process
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
script: |
import pandas as pd
import requests
import boto3
print(f"Pandas version: {pd.__version__}")
print(f"Requests version: {requests.__version__}")
print(f"Boto3 version: {boto3.__version__}")
Building a custom Docker image for your script tasks
Imagine you use the following flow:
id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"
The Python task requires pandas to be installed. Pandas is a large library, and it's not included in the default python
image. In this case, you have the following options:
- Install pandas in the
beforeCommands
property of the Python task. - Use one of our pre-built images that already include pandas, such as the
ghcr.io/kestra-io/pydata:latest
image. - Build your own custom Docker image that includes pandas.
1) Installing pandas in the beforeCommands
property
id: install_pandas_at_runtime
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
beforeCommands:
- pip install pyarrow pandas
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")
2) Using one of our pre-built images
id: use_prebuilt_image
namespace: company.team
tasks:
- id: custom_dependencies
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
print(f"Pandas version: {pd.__version__}")
3) Building a custom Docker image
If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:
FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion
Then, build the image:
docker build -t kestra-custom:latest .
Finally, use that image in your flow:
id: zip_to_python
namespace: company.team
variables:
file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
- id: get_zipfile
type: io.kestra.plugin.core.http.Download
uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip
type: io.kestra.plugin.compress.ArchiveDecompress
algorithm: ZIP
from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
containerImage: kestra-custom:latest # ⚡️ Use your custom image here
env:
FILE_ID: "{{ render(vars.file_id) }}"
inputFiles: "{{ outputs.unzip.files }}"
script: |
import os
import pandas as pd
file_id = os.environ["FILE_ID"]
file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file)
df.to_parquet(f"{file_id}.parquet")
outputFiles:
- "*.parquet"
Note how the pullPolicy: NEVER
property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.
If you want to run languages other than Python using a custom Docker image, here is an example with Go.
Was this page helpful?