There are several ways of installing custom packages for your workflows. This page shows how to install dependencies at runtime using the beforeCommands property.

Installing dependencies using beforeCommands

While you could bake all your package dependencies into a custom container image, often it's convenient to install a couple of additional packages at runtime without having to build separate images. The beforeCommands can be used for that purpose.

pip install package

Here is a simple example installing pip packages requests and kestra before starting the script:

yaml
id: pip
namespace: dev

tasks:
  - id: before_commands
    type: io.kestra.plugin.scripts.python.Script
    docker:
      image: python:3.11-slim
    beforeCommands:
      - pip install requests kestra > /dev/null
    script: |
      import requests
      import kestra

      kestra_modules = [i for i in dir(kestra.Kestra) if not i.startswith("_")]

      print(f"Requests version: {requests.__version__}")
      print(f"Kestra modules: {kestra_modules}")

pip install -r requirements.txt

This example clones a Git repository that contains a requirements.txt file. The script task uses beforeCommands to install those packages. We then list recently installed packages to validate that this process works as expected:

yaml
id: python_requirements_file
namespace: dev

tasks:
  - id: wdir
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: cloneRepository
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/examples
        branch: main

      - id: print_requirements
        type: io.kestra.plugin.scripts.shell.Commands
        taskRunner:
          type: io.kestra.plugin.core.runner.Process
        commands:
          - cat requirements.txt

      - id: list_installed_packages
        type: io.kestra.plugin.scripts.python.Commands
        warningOnStdErr: false
        docker:
          image: python:3.11-slim
        beforeCommands:
          - pip install -r requirements.txt > /dev/null
        commands:
          - ls -lt $(python -c "import site; print(site.getsitepackages()[0])") | head -n 20

And here is a simple version where we add the requirements.txt file using the inputFiles property:

yaml
id: python_requirements_file
namespace: dev

tasks:
  - id: list_installed_packages
    type: io.kestra.plugin.scripts.python.Script
    env:
      PIP_ROOT_USER_ACTION: ignore
    inputFiles:
      requirements.txt: |
        polars
        requests
        kestra
    docker:
      image: python:3.11-slim
    beforeCommands:
      - pip install --upgrade pip
      - pip install -r requirements.txt > /dev/null
    script: |
      from kestra import Kestra
      import pkg_resources
      import re

      with open('requirements.txt', 'r') as file:
          # find package names without versions
          required_packages = {re.match(r'^\s*([a-zA-Z0-9_-]+)', line).group(1) for line in file if line.strip()}

      installed_packages = [(d.project_name, d.version) for d in pkg_resources.working_set]

      kestra_outputs = {}

      for name, version in installed_packages:
          if name in required_packages:
              kestra_outputs[name] = version

      Kestra.outputs(kestra_outputs)

As you can see here, the WorkingDirectory task is usually only needed if you use the git.Clone task. In most other cases, you can use the inputFiles property to add files to the script's working directory.

Using Kestra's prebuilt images

Many data engineering use cases require performing fairly standardized tasks such as:

  • processing data with pandas
  • transforming data with dbt-core (using a dbt adapter for your data warehouse)
  • making API calls with the requests library, etc.

To solve those common challenges, the kestra-io/examples repository provides several public Docker images with the latest versions of those common packages. Many Blueprints use those public images by default. The images are hosted in GitHub Container Registry managed by Kestra's team and those images follow the naming ghcr.io/kestra-io/packageName:latest.

Example: running R script in Docker

Here is a simple example using the ghcr.io/kestra-io/rdata:latest Docker image provided by Kestra to analyze the built-in mtcars dataset using dplyr and arrow R libraries:

yaml
id: rCars
namespace: dev

tasks:
  - id: r
    type: io.kestra.plugin.scripts.r.Script
    warningOnStdErr: false
    docker:
      image: ghcr.io/kestra-io/rdata:latest
    outputFiles:
      - "*.csv"
      - "*.parquet"
    script: |
      library(dplyr)
      library(arrow)
      data(mtcars) # Load mtcars data
      print(head(mtcars))
      final <- mtcars %>%
        summarise(
          avg_mpg = mean(mpg),
          avg_disp = mean(disp),
          avg_hp = mean(hp),
          avg_drat = mean(drat),
          avg_wt = mean(wt),
          avg_qsec = mean(qsec),
          avg_vs = mean(vs),
          avg_am = mean(am),
          avg_gear = mean(gear),
          avg_carb = mean(carb)
        )
      final %>% print()
      write.csv(final, "final.csv")
      mtcars_clean <- na.omit(mtcars) # remove rows with NA values
      write_parquet(mtcars_clean, "mtcars_clean.parquet")

Installation of R libraries is time-consuming. From a technical standpoint, you could install custom R packages at runtime as follows:

yaml
id: rCars
namespace: dev

tasks:
  - id: r
    type: io.kestra.plugin.scripts.r.Script
    warningOnStdErr: false
    docker:
      image: ghcr.io/kestra-io/rdata:latest
    beforeCommands:
      - Rscript -e "install.packages(c('dplyr', 'arrow'))" > /dev/null 2>&1

However, that flow above might take up to 30 minutes, depending on the R packages you install.

Prebuilt Docker images such as ghcr.io/kestra-io/rdata:latest can help you iterate much faster. Before moving to production, you can build your custom images with the exact package versions that you need.

Was this page helpful?