Source
yaml
id: caching
namespace: company.team
tasks:
- id: transactions
type: io.kestra.plugin.core.http.Download
uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/transactions.csv
- id: products
type: io.kestra.plugin.core.http.Download
uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/csv/cache_demo/products.csv
description: This task pulls the full product catalog once per day. Because the
catalog changes infrequently and contains over 200k rows, running it only
daily avoids unnecessary strain on that production DB, while ensuring
downstream joins always use up-to-date reference data.
taskCache:
enabled: true
ttl: PT24H
- id: duckdb
type: io.kestra.plugin.jdbc.duckdb.Query
store: true
inputFiles:
products.csv: "{{ outputs.products.uri }}"
transactions.csv: "{{ outputs.transactions.uri }}"
sql: |-
SELECT
t.transaction_id,
t.timestamp,
t.quantity,
t.sale_price,
p.product_name,
p.category,
p.cost_price,
p.supplier_id,
(t.sale_price - p.cost_price) * t.quantity AS profit
FROM
read_csv_auto('transactions.csv') AS t
JOIN
read_csv_auto('products.csv') AS p
USING (product_id);
About this blueprint
SQL Kestra Database
This flow illustrates the use of Kestra's taskCache
feature to cache a task extracting large product catalog, reducing load on the source system.
- The
transactions
task downloads recent transactions data without caching. - The
products
task downloads the full product catalog and caches the result for 24 hours using thetaskCache
property, ensuring that downstream tasks use fresh data while avoiding repeated downloads within the TTL. - The
duckdb
task joins the transactions and product data using DuckDB SQL, calculates profit per transaction, and stores the result.