Source
yaml
id: parquet-duckdb-to-excel
namespace: company.team
tasks:
- id: parquet_duckdb
type: io.kestra.plugin.jdbc.duckdb.Query
sql: >
INSTALL parquet; LOAD parquet; INSTALL httpfs; LOAD httpfs; SELECT * FROM
read_parquet('https://huggingface.co/datasets/kestra/datasets/resolve/main/jaffle-large/raw_items.parquet?download=true')
LIMIT 1000000;
fetchType: STORE
- id: duckdb_to_excel
type: io.kestra.plugin.serdes.excel.IonToExcel
from: "{{ outputs.parquet_duckdb.uri }}"
About this blueprint
DuckDB
This flow will extract a Parquet file from a remote location, transform it using DuckDB and export it in Excel format. The Parquet file is a dataset of 4.5 million items from a fictional e-commerce store. Because Excel has a limit of roughly 1 million rows, we will limit the number of query results to 1 million rows in order to export it in Excel format.
More Related Blueprints