Fetch data from Apache Druid and transform it in Python with Pandas

About this blueprint

Python SQL

This flow will:

Run a SQL query to fetch data from Apache Druid.
Store the query results in a CSV file.
Read the CSV file and process it using Pandas.

To set up Apache Druid locally, follow the instructions mentioned in the following documentation page.

yaml

id: druid_to_pandas
namespace: blueprint

tasks:
  - id: query_druid
    type: io.kestra.plugin.jdbc.druid.Query
    url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
    sql: |
      SELECT __time as edit_time, channel, page, user, delta, added, deleted
      FROM wikipedia
    fetch: true
    store: true

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.CsvWriter
    from: "{{ outputs.query_druid.uri }}" 

  - id: process_using_pandas
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install pandas > /dev/null
    script: |
      import pandas as pd

      df = pd.read_csv("{{ outputs.write_to_csv.uri }}")
      df.head()

Query

Csv Writer

Script

More Related Blueprints

Ingest Python SQL

Fetch data from Couchbase and transform it with Pandas in Python

Outputs Python SQL

Query data from Dremio and process it in Polars with a Python script

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra