Run multiple tasks in the same working directory sequentially.
This task allows you to run multiple tasks sequentially in the same working directory. It is useful when you want to share files from Namespace Files or from a Git repository across multiple tasks.
When to use the WorkingDirectory
task
By default, all Kestra tasks are stateless. If one task generates files, those files won’t be available in downstream tasks unless they are persisted in internal storage. Upon each task completion, the temporary directory for the task is purged. This behavior is generally useful as it keeps your environment clean and dependency free, and it avoids potential privacy or security issues when exposing some data generated by a task to other processes.
Despite the benefits of the stateless execution, in certain scenarios, statefulness is desirable. Imagine that you want to execute several Python scripts, and each of them generates some output data. Another script combines that data as part of an ETL/ML process. Executing those related tasks in the same working directory and sharing state between them is helpful for the following reasons:
- You can attach namespace files to the
WorkingDirectory
task and use them in all downstream tasks. This allows you to work the same way you would work on your local machine, where you can import modules from the same directory. - Within a
WorkingDirectory
, you can clone your entire GitHub branch with multiple modules and configuration files needed to run several scripts and reuse them across multiple downstream tasks. - You can execute multiple scripts sequentially on the same worker or in the same container, minimizing latency.
- Output artifacts of each task (such as CSV, JSON or Parquet files you generate in your script) are directly available to other tasks without having to persist them within the internal storage. This is because all child tasks of the
WorkingDirectory
task share the same file system.
The WorkingDirectory
task allows you to:
- Share files from Namespace Files or from a Git repository across multiple tasks
- Run multiple tasks sequentially in the same working directory
- Share data across multiple tasks without having to persist it in internal storage.
Example
In this example, the flow sequentially executes Shell Scripts and Shell Commands in the same working directory using a local Process Task Runner.
id: shell_scripts
namespace: company.team
tasks:
- id: working_directory
type: io.kestra.plugin.core.flow.WorkingDirectory
tasks:
- id: create_csv_file
type: io.kestra.plugin.scripts.shell.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
script: |
#!/bin/bash
echo "Column1,Column2,Column3" > file.csv
for i in {1..10}
do
echo "$i,$RANDOM,$RANDOM" >> file.csv
done
- id: inspect_file
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
type: io.kestra.plugin.core.runner.Process
commands:
- cat file.csv
- id: filter_file
type: io.kestra.plugin.scripts.shell.Commands
description: select only the first five rows of the second column
taskRunner:
type: io.kestra.plugin.core.runner.Process
commands:
- cut -d ',' -f 2 file.csv | head -n 6
Was this page helpful?