Extract data from Apache Pinot into a CSV format and load it into BigQuery

Source

yaml

id: pinot-to-bigquery
namespace: company.team

tasks:
  - id: query_pinot
    type: io.kestra.plugin.jdbc.pinot.Query
    url: jdbc:pinot://192.168.0.106:9000
    sql: |
      SELECT lat, lon, name, address, cast(location_st_point as string)
      FROM starbucksStores
    fetchType: STORE

  - id: write_to_csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query_pinot.uri }}"

  - id: load_bigquery
    type: io.kestra.plugin.gcp.bigquery.Load
    from: "{{ outputs.write_to_csv.uri }}"
    serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"
    projectId: my_project
    destinationTable: my_project.my_dataset.my_table
    format: CSV
    csvOptions:
      fieldDelimiter: ","
      skipLeadingRows: 1
    schema:
      fields:
        - name: lat
          type: FLOAT64
        - name: lon
          type: FLOAT64
        - name: name
          type: STRING
        - name: address
          type: STRING
        - name: location_st_point
          type: STRING

About this blueprint

SQL GCP

This flow extracts data from an Apache Pinot table, writes it to a CSV file, and then loads the CSV data into BigQuery. Note that we convert the bytes column location_st_point into a STRING format while extracting the data from Pinot. The GCP credentials are configured using an environment variable GCP_SERVICE_ACCOUNT_JSON. The BigQuery table will get created at runtime using the schema provided in the flow code. You can setup Apache Pinot locally with Docker using: bash docker pull apachepinot/pinot:latest docker run -p 9000:9000 apachepinot/pinot:latest QuickStart -type batch The Quickstart mode will automatically load the example datasets. One of the example datasets starbucksStores is used in this flow.

Query

Ion To Csv

Load

More Related Blueprints

SQLGCP

Load data from Postgres to BigQuery using Singer

SQLGCP

Extract data from Cassandra into a CSV file, and load it to BigQuery

SQLGCP

Extract data, mask sensitive columns using DuckDB and load it to BigQuery

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra