Blueprints

Extract data from Apache Pinot into a CSV format and load it into BigQuery

Source

yaml
id: pinot-to-bigquery
namespace: company.team

tasks:
  - id: query_pinot
    type: io.kestra.plugin.jdbc.pinot.Query
    url: jdbc:pinot://192.168.0.106:9000
    sql: |
      SELECT lat, lon, name, address, cast(location_st_point as string)
      FROM starbucksStores
    fetchType: STORE

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query_pinot.uri }}"

  - id: load_bigquery
    type: io.kestra.plugin.gcp.bigquery.Load
    from: "{{ outputs.write_to_csv.uri }}"
    serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"
    projectId: my_project
    destinationTable: my_project.my_dataset.my_table
    format: CSV
    csvOptions:
      fieldDelimiter: ","
      skipLeadingRows: 1
    schema:
      fields:
        - name: lat
          type: FLOAT64
        - name: lon
          type: FLOAT64
        - name: name
          type: STRING
        - name: address
          type: STRING
        - name: location_st_point
          type: STRING

About this blueprint

Ingest BigQuery SQL

This flow extracts data from an Apache Pinot table, writes it to a CSV file, and then loads the CSV data into BigQuery. Note that we convert the bytes column location_st_point into a STRING format while extracting the data from Pinot. The GCP credentials are configured using an environment variable GCP_SERVICE_ACCOUNT_JSON. The BigQuery table will get created at runtime using the schema provided in the flow code. You can setup Apache Pinot locally with Docker using: bash docker pull apachepinot/pinot:latest docker run -p 9000:9000 apachepinot/pinot:latest QuickStart -type batch The Quickstart mode will automatically load the example datasets. One of the example datasets starbucksStores is used in this flow.

Query

Ion To Csv

Load

FLOAT64

STRING

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra