Source
yaml
id: druid-to-pandas
namespace: company.team
tasks:
  - id: query_druid
    type: io.kestra.plugin.jdbc.druid.Query
    url: jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/;transparent_reconnection=true
    sql: |
      SELECT __time as edit_time, channel, page, user, delta, added, deleted
      FROM wikipedia
    fetchType: STORE
  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query_druid.uri }}"
  - id: process_using_pandas
    type: io.kestra.plugin.scripts.python.Script
    beforeCommands:
      - pip install pandas > /dev/null
    script: |
      import pandas as pd
      df = pd.read_csv("{{ outputs.write_to_csv.uri }}")
      df.head()
About this blueprint
Python SQL
This flow will:
- Run a SQL query to fetch data from Apache Druid. 2. Store the query results in a CSV file. 3. Read the CSV file and process it using Pandas. To set up Apache Druid locally, follow the instructions mentioned in the following documentation page.
 
More Related Blueprints