Fetch data from Apache Druid and transform it in Python with Pandas

Source

yaml

id: druid-to-pandas
namespace: company.team

tasks:
  - id: query_druid
    type: io.kestra.plugin.jdbc.druid.Query
    url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
    sql: |
      SELECT __time as edit_time, channel, page, user, delta, added, deleted
      FROM wikipedia
    fetchType: STORE

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query_druid.uri }}"

  - id: process_using_pandas
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install pandas > /dev/null
    script: |
      import pandas as pd

      df = pd.read_csv("{{ outputs.write_to_csv.uri }}")
      df.head()

About this blueprint

Python SQL

This flow will:

Run a SQL query to fetch data from Apache Druid. 2. Store the query results in a CSV file. 3. Read the CSV file and process it using Pandas. To set up Apache Druid locally, follow the instructions mentioned in the following documentation page.

Query

Ion To Csv

Script

More Related Blueprints

Getting StartedPythonSQLAPI

Getting started with Kestra — a Data Engineering Pipeline example

NotificationsPythonSQL

Use Debezium to trigger a flow whenever new entries hit a Postgres database, then send notification to Slack and process data in Python

PythonSQLAWS

Extract data from an API using Python, then load it to Postgres and S3

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra