Blueprints

Extract and transform a Parquet file using DuckDB and export it in Excel format

Source

yaml
id: parquet-duckdb-to-excel
namespace: company.team

tasks:
  - id: parquet_duckdb
    type: io.kestra.plugin.jdbc.duckdb.Query
    sql: >
      INSTALL parquet; LOAD parquet; INSTALL httpfs; LOAD httpfs; SELECT *  FROM
      read_parquet('https://huggingface.co/datasets/kestra/datasets/resolve/main/jaffle-large/raw_items.parquet?download=true')
      LIMIT 1000000;
    fetchType: STORE

  - id: duckdb_to_excel
    type: io.kestra.plugin.serdes.excel.IonToExcel
    from: "{{ outputs.parquet_duckdb.uri }}"

About this blueprint

DuckDB

This flow will extract a Parquet file from a remote location, transform it using DuckDB and export it in Excel format. The Parquet file is a dataset of 4.5 million items from a fictional e-commerce store. Because Excel has a limit of roughly 1 million rows, we will limit the number of query results to 1 million rows in order to export it in Excel format.

Query

Ion To Excel

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra