Extract data from Apache Pinot into a CSV format and load it into BigQuery

About this blueprint

BigQuery Ingest SQL

This flow extracts data from an Apache Pinot table, writes it to a CSV file, and then loads the CSV data into BigQuery.

Note that we convert the bytes column location_st_point into a STRING format while extracting the data from Pinot.

The GCP credentials are configured using an environment variable GCP_SERVICE_ACCOUNT_JSON. The BigQuery table will get created at runtime using the schema provided in the flow code.

You can setup Apache Pinot locally with Docker using:

bash

docker pull apachepinot/pinot:latest
docker run -p 9000:9000 apachepinot/pinot:latest QuickStart -type batch

The Quickstart mode will automatically load the example datasets. One of the example datasets starbucksStores is used in this flow.

yaml

id: pinot_to_bigquery
namespace: blueprint

tasks:
  - id: query_pinot
    type: "io.kestra.plugin.jdbc.pinot.Query"
    url: jdbc:pinot://192.168.0.106:9000
    sql: |
      SELECT lat, lon, name, address, cast(location_st_point as string)
      FROM starbucksStores
    fetch: true
    store: true

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.CsvWriter
    from: "{{ outputs.query_pinot.uri }}"
    
  - id: load_bigquery
    type: "io.kestra.plugin.gcp.bigquery.Load"
    from: "{{ outputs.write_to_csv.uri }}"
    serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"
    projectId: "my_project"
    destinationTable: "my_project.my_dataset.my_table"
    format: CSV
    csvOptions:
      fieldDelimiter: ","
      skipLeadingRows: 1
    schema:
      fields:
      - name: lat
        type: FLOAT64
      - name: lon
        type: FLOAT64
      - name: name
        type: STRING
      - name: address
        type: STRING
      - name: location_st_point
        type: STRING

Query

Csv Writer

Load

More Related Blueprints

BigQuery Ingest SQL

Extract data from Cassandra into a CSV file, and load it to BigQuery

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra