Extract and transform a Parquet file using DuckDB and export it in Excel format

Source

yaml

id: parquet-duckdb-to-excel
namespace: company.team

tasks:
  - id: parquet_duckdb
    type: io.kestra.plugin.jdbc.duckdb.Query
    sql: |
      INSTALL parquet;
      LOAD parquet;
      INSTALL httpfs;
      LOAD httpfs;
      SELECT * 
      FROM
      read_parquet('https://huggingface.co/datasets/kestra/datasets/resolve/main/jaffle-large/raw_items.parquet?download=true')
      LIMIT 1000000;
    fetchType: STORE

  - id: duckdb_to_excel
    type: io.kestra.plugin.serdes.excel.IonToExcel
    from: "{{ outputs.parquet_duckdb.uri }}"

About this blueprint

SQL

This flow will extract a Parquet file from a remote location, transform it using DuckDB and export it in Excel format.

The Parquet file is a dataset of 4.5 million items from a fictional e-commerce store.

Because Excel has a limit of roughly 1 million rows, we will limit the number of query results to 1 million rows in order to export it in Excel format.

Query

Ion To Excel

More Related Blueprints

SQLAWS

Query data from Druid and load it to Redshift

NotificationsPythonSQL

Use Debezium to trigger a flow whenever new entries hit a Postgres database, then send notification to Slack and process data in Python

PythonSQLGit

Extract data from an API and load it to Postgres using Python, Git and Docker (passing custom environment variables to the container)

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra