Source
id: pinot-to-bigquery
namespace: company.team
tasks:
- id: query_pinot
type: io.kestra.plugin.jdbc.pinot.Query
url: jdbc:pinot://192.168.0.106:9000
sql: |
SELECT lat, lon, name, address, cast(location_st_point as string)
FROM starbucksStores
fetchType: STORE
- id: write_to_csv
type: io.kestra.plugin.serdes.csv.IonToCsv
from: "{{ outputs.query_pinot.uri }}"
- id: load_bigquery
type: io.kestra.plugin.gcp.bigquery.Load
from: "{{ outputs.write_to_csv.uri }}"
serviceAccount: "{{ secret('GCP_SERVICE_ACCOUNT_JSON') }}"
projectId: my_project
destinationTable: my_project.my_dataset.my_table
format: CSV
csvOptions:
fieldDelimiter: ","
skipLeadingRows: 1
schema:
fields:
- name: lat
type: FLOAT64
- name: lon
type: FLOAT64
- name: name
type: STRING
- name: address
type: STRING
- name: location_st_point
type: STRING
About this blueprint
Ingest BigQuery SQL
This flow extracts data from an Apache Pinot table, writes it to a CSV file, and then loads the CSV data into BigQuery.
Note that we convert the bytes column location_st_point
into a STRING format while extracting the data from Pinot.
The GCP credentials are configured using an environment variable GCP_SERVICE_ACCOUNT_JSON
. The BigQuery table will get created at runtime using the schema provided in the flow code.
You can setup Apache Pinot locally with Docker using:
bash docker pull apachepinot/pinot:latest docker run -p 9000:9000 apachepinot/pinot:latest QuickStart -type batch
The Quickstart
mode will automatically load the example datasets. One of the example datasets starbucksStores
is used in this flow.
More Related Blueprints