Run multiple tasks in the same working directory sequentially.

This task allows you to run multiple tasks sequentially in the same working directory. It is useful when you want to share files from Namespace Files or from a Git repository across multiple tasks.

When to use the WorkingDirectory task

By default, all Kestra tasks are stateless. If one task generates files, those files won’t be available in downstream tasks unless they are persisted in internal storage. Upon each task completion, the temporary directory for the task is purged. This behavior is generally useful as it keeps your environment clean and dependency free, and it avoids potential privacy or security issues when exposing some data generated by a task to other processes.

Despite the benefits of the stateless execution, in certain scenarios, statefulness is desirable. Imagine that you want to execute several Python scripts, and each of them generates some output data. Another script combines that data as part of an ETL/ML process. Executing those related tasks in the same working directory and sharing state between them is helpful for the following reasons:

  • You can attach namespace files to the WorkingDirectory task and use them in all downstream tasks. This allows you to work the same way you would work on your local machine, where you can import modules from the same directory.
  • Within a WorkingDirectory, you can clone your entire GitHub branch with multiple modules and configuration files needed to run several scripts and reuse them across multiple downstream tasks.
  • You can execute multiple scripts sequentially on the same worker or in the same container, minimizing latency.
  • Output artifacts of each task (such as CSV, JSON or Parquet files you generate in your script) are directly available to other tasks without having to persist them within the internal storage. This is because all child tasks of the WorkingDirectory task share the same file system.

The WorkingDirectory task allows you to:

  1. Share files from Namespace Files or from a Git repository across multiple tasks
  2. Run multiple tasks sequentially in the same working directory
  3. Share data across multiple tasks without having to persist it in internal storage.

For more detail, check out the plugin documentation

Example

In this example, the flow sequentially executes Shell Scripts and Shell Commands in the same working directory using a local Process Task Runner.

yaml
id: shell_scripts
namespace: company.team

tasks:
  - id: working_directory
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: create_csv_file
        type: io.kestra.plugin.scripts.shell.Script
        taskRunner:
          type: io.kestra.plugin.core.runner.Process
        script: |
          #!/bin/bash
          echo "Column1,Column2,Column3" > file.csv
          for i in {1..10}
          do
            echo "$i,$RANDOM,$RANDOM" >> file.csv
          done
      
      - id: inspect_file
        type: io.kestra.plugin.scripts.shell.Commands
        taskRunner:
          type: io.kestra.plugin.core.runner.Process
        commands:
          - cat file.csv  
      
      - id: filter_file
        type: io.kestra.plugin.scripts.shell.Commands
        description: select only the first five rows of the second column
        taskRunner:
          type: io.kestra.plugin.core.runner.Process
        commands:
          - cut -d ',' -f 2 file.csv | head -n 6

Was this page helpful?