Blueprints

Fetch data from Apache Druid and transform it in Python with Pandas

About this blueprint

Python SQL

This flow will:

  1. Run a SQL query to fetch data from Apache Druid.
  2. Store the query results in a CSV file.
  3. Read the CSV file and process it using Pandas.

To set up Apache Druid locally, follow the instructions mentioned in the following documentation page.

yaml
id: druid_to_pandas
namespace: blueprint

tasks:
  - id: query_druid
    type: io.kestra.plugin.jdbc.druid.Query
    url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
    sql: |
      SELECT __time as edit_time, channel, page, user, delta, added, deleted
      FROM wikipedia
    fetch: true
    store: true

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.CsvWriter
    from: "{{ outputs.query_druid.uri }}" 

  - id: process_using_pandas
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install pandas > /dev/null
    script: |
      import pandas as pd

      df = pd.read_csv("{{ outputs.write_to_csv.uri }}")
      df.head()

Query

Csv Writer

Script

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra