Extract data from a REST API, process it in Python with Polars in a Docker container, then run DuckDB query and preview results as a table in the Outputs tab

About this blueprint

API DuckDB Outputs Python SQL

This flow will download a file from a REST API, process it with Python and SQL, and store the result in the internal storage.

the api http Request task uses a public API — to interact with a private API endpoint, check the task documentation for examples on how to authenticate your request
the python task runs in a separate Docker container and installs polars before starting the script
the DuckDB query task uses data from a previous task and outputs the query result to internal storage.

The Outputs tab provides a well-formatted table with the query results.

yaml

id: api_python_sql
namespace: company.team

tasks:
  - id: api
    type: io.kestra.plugin.core.http.Request
    uri: https://dummyjson.com/products

  - id: python
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: python:slim
    beforeCommands:
      - pip install polars
    outputFiles:
      - products.csv
    warningOnStdErr: false
    script: |
      import polars as pl
      data = {{ outputs.api.body | jq('.products') | first }}
      df = pl.from_dicts(data)
      df.glimpse()
      df.select(["brand", "price"]).write_csv("products.csv")

  - id: sql_query
    type: io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      in.csv: "{{ outputs.python.outputFiles['products.csv'] }}"
    sql: |
      SELECT brand, round(avg(price), 2) as avg_price
      FROM read_csv_auto('{{ workingDir }}/in.csv', header=True)
      GROUP BY brand
      ORDER BY avg_price DESC;
    store: true

Request

Script

Query

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra