Blueprints

Extract and transform a Parquet file using DuckDB and export it in Excel format

About this blueprint

DuckDB Local files

This flow will extract a Parquet file from a remote location, transform it using DuckDB and export it in Excel format. The Parquet file is a dataset of 4.5 million items from a fictional e-commerce store. Because Excel has a limit of roughly 1 million rows, we will limit the number of query results to 1 million rows in order to export it in Excel format.

yaml
id: parquet_duckdb_to_excel
namespace: blueprints

tasks:
  - id: parquet_duckdb
    type: io.kestra.plugin.jdbc.duckdb.Query
    sql: |
      INSTALL parquet;
      LOAD parquet;
      INSTALL httpfs;
      LOAD httpfs;
      SELECT * 
      FROM read_parquet('https://huggingface.co/datasets/kestra/datasets/resolve/main/jaffle-large/raw_items.parquet?download=true')
      LIMIT 1000000;
    store: true

  - id: duckdb_to_excel
    type: io.kestra.plugin.serdes.excel.IonToExcel
    from: "{{ outputs.parquet.uri }}"

Query

Ion To Excel

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra