Manage file caching inside of Kestra.
Kestra provides a file caching, which is especially useful when you work with sizeable package dependencies that don't change often.
Cache files in a WorkingDirectory
task
The file caching functionality on the WorkingDirectory
task allows you to cache a subset of files to speed up your workflow execution. This is especially useful when you work with sizeable package dependencies that don't change often.
Kestra can only cache files installed or created as part of the script tasks if the script uses a PROCESS
runner. If the script uses a DOCKER
runner, the files will not be cached and the WorkingDirectory
task will throw an error: Unable to execute WorkingDirectory post actions
.
Use cases for file caching
The file caching is useful if you want to install some pip
or npm
packages before running your script. You can cache the node_modules
or Python venv
folder to avoid re-installing the dependencies on each run.
To do that, add a cache
to your WorkingDirectory
task. The cache
property accepts a list of glob patterns
to match files to cache. The cache will be automatically invalidated after a specified time-to-live using the ttl
property accepting a duration.
id: caching_files
namespace: company.team
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
cache:
patterns:
- some_directory/**
ttl: PT1H
How does it work under the hood
Kestra packages the files that need to be cached and stores them in the internal storage. When the task is executed again, the cached files are retrieved, initializing the working directory with their contents.
Node.js example
Below is an example of a flow that installs the colors
package before running a Node.js script. The node_modules
folder is cached for one hour.
id: node_cached_dependencies
namespace: company.team
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
cache:
patterns:
- node_modules/**
ttl: PT1H
tasks:
- id: node_script
type: io.kestra.plugin.scripts.node.Script
beforeCommands:
- npm install colors
script: |
const colors = require("colors");
console.log(colors.red("Hello"));
Python example
Below is an example of a flow that installs the pandas
package before running a Python script. The venv
folder is cached for one day.
id: python_cached_dependencies
namespace: company.team
tasks:
- id: working_dir
type: io.kestra.plugin.core.flow.WorkingDirectory
tasks:
- id: python_script
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.core.runner.Process
warningOnStdErr: false
beforeCommands:
- python -m venv venv
- source venv/bin/activate
- pip install pandas
script: |
import pandas as pd
print(pd.__version__)
cache:
patterns:
- venv/**
ttl: PT24H
How to invalidate the cache
Below are the details how to invalidate the cache:
- After the first run, the files are cached
- The next time the task is executed:
- If the
ttl
didn't pass, then the files are retrieved from cache. - If the
ttl
passed, then the cache is invalidated and no files will be retrieved from cache; because cache is no longer present, thenpm install
command from thebeforeCommands
property will take a bit longer to execute.
- If the
- If you edit the task and change the
ttl
to:- a longer duration e.g.,
PT5H
— the files will be cached for five hours using the newttl
duration - a shorter duration e.g.,
PT5M
— the cache will be invalidated after five minutes using the newttl
duration.
- a longer duration e.g.,
The ttl
is evaluated at runtime. If the most recently set ttl
duration has passed as compared to the last task run execution date, the cache is invalidated and the files are no longer retrieved from cache.
Was this page helpful?