Blueprints

Fetch data from Apache Druid and transform it in Python with Pandas

Source

yaml
id: druid-to-pandas
namespace: company.team

tasks:
  - id: query_druid
    type: io.kestra.plugin.jdbc.druid.Query
    url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
    sql: |
      SELECT __time as edit_time, channel, page, user, delta, added, deleted
      FROM wikipedia
    fetchType: STORE

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query_druid.uri }}"

  - id: process_using_pandas
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install pandas > /dev/null
    script: |
      import pandas as pd

      df = pd.read_csv("{{ outputs.write_to_csv.uri }}")
      df.head()

About this blueprint

Python SQL

This flow will:

  1. Run a SQL query to fetch data from Apache Druid. 2. Store the query results in a CSV file. 3. Read the CSV file and process it using Pandas. To set up Apache Druid locally, follow the instructions mentioned in the following documentation page.

Query

Ion To Csv

Script

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra