Source
yaml
id: druid-to-pandas
namespace: company.team
tasks:
- id: query_druid
type: io.kestra.plugin.jdbc.druid.Query
url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
sql: |
SELECT __time as edit_time, channel, page, user, delta, added, deleted
FROM wikipedia
fetchType: STORE
- id: write_to_csv
type: io.kestra.plugin.serdes.csv.IonToCsv
from: "{{ outputs.query_druid.uri }}"
- id: process_using_pandas
type: io.kestra.plugin.scripts.python.Script
beforeCommands:
- pip install pandas > /dev/null
script: |
import pandas as pd
df = pd.read_csv("{{ outputs.write_to_csv.uri }}")
df.head()
About this blueprint
Python SQL
This flow will:
- Run a SQL query to fetch data from Apache Druid. 2. Store the query results in a CSV file. 3. Read the CSV file and process it using Pandas. To set up Apache Druid locally, follow the instructions mentioned in the following documentation page.
More Related Blueprints