Build a Custom Docker Image for Script Tasks
Build a custom Docker image for your script tasks.
Build custom Docker images for scripts
Use Kestra base image
You can bake all dependencies needed for your script tasks directly into the Kestra’s base image. Here is an example installing Python dependencies:
FROM kestra/kestra:latest
USER rootRUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3Then, point to that Dockerfile in your docker-compose.yml file:
services: kestra: build: context: . dockerfile: Dockerfile image: kestra-python:latestOnce you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:
id: python_processnamespace: company.teamtasks: - id: custom_dependencies type: io.kestra.plugin.scripts.python.Script runner: PROCESS script: | import pandas as pd import requests import boto3 print(f"Pandas version: {pd.__version__}") print(f"Requests version: {requests.__version__}") print(f"Boto3 version: {boto3.__version__}")Building a custom Docker image for your script tasks
Imagine you use the following flow:
id: zip_to_pythonnamespace: company.team
variables: file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks: - id: get_zipfile type: io.kestra.plugin.core.http.Download uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip type: io.kestra.plugin.compress.ArchiveDecompress algorithm: ZIP from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.scripts.runner.docker.Docker containerImage: ghcr.io/kestra-io/pydata:latest env: FILE_ID: "{{ render(vars.file_id) }}" inputFiles: "{{ outputs.unzip.files }}" script: | import os import pandas as pd
file_id = os.environ["FILE_ID"] file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file) df.to_parquet(f"{file_id}.parquet") outputFiles: - "*.parquet"The Python task requires pandas to be installed. Pandas is a large library, and it’s not included in the default python image. In this case, you have the following options:
- Install pandas in the
beforeCommandsproperty of the Python task. - Use one of our pre-built images that already include pandas, such as the
ghcr.io/kestra-io/pydata:latestimage. - Build your own custom Docker image that includes pandas.
1) Installing pandas in the beforeCommands property
id: install_pandas_at_runtimenamespace: company.teamtasks: - id: custom_dependencies type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.core.runner.Process beforeCommands: - pip install pyarrow pandas script: | import pandas as pd print(f"Pandas version: {pd.__version__}")2) Using one of our pre-built images
id: use_prebuilt_imagenamespace: company.teamtasks: - id: custom_dependencies type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.scripts.runner.docker.Docker containerImage: ghcr.io/kestra-io/pydata:latest script: | import pandas as pd print(f"Pandas version: {pd.__version__}")3) Building a custom Docker image
If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:
FROM python:3.11-slimRUN pip install --upgrade pipRUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ionThen, build the image:
docker build -t kestra-custom:latest .Finally, use that image in your flow:
id: zip_to_pythonnamespace: company.team
variables: file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks: - id: get_zipfile type: io.kestra.plugin.core.http.Download uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
- id: unzip type: io.kestra.plugin.compress.ArchiveDecompress algorithm: ZIP from: "{{ outputs.get_zipfile.uri }}"
- id: parquet_output type: io.kestra.plugin.scripts.python.Script taskRunner: type: io.kestra.plugin.scripts.runner.docker.Docker pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub containerImage: kestra-custom:latest # ⚡️ Use your custom image here env: FILE_ID: "{{ render(vars.file_id) }}" inputFiles: "{{ outputs.unzip.files }}" script: | import os import pandas as pd
file_id = os.environ["FILE_ID"] file = f"{file_id}-divvy-tripdata.csv"
df = pd.read_csv(file) df.to_parquet(f"{file_id}.parquet") outputFiles: - "*.parquet"Note how the pullPolicy: NEVER property is used to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.
If you want to run languages other than Python using a custom Docker image, here is an example with Go.
Was this page helpful?