Run BigQuery jobs on the event that matters.

BigQuery runs the SQL and the warehouse. Kestra decides when each job fires, what feeds it, and how much it scans. Load on a GCS arrival, refresh only the changed partitions, chain with ingestion and dbt, and trace every query in one execution history.

Blueprints for BigQuery orchestration.

BigQuery Scheduled Queries fire SQL on a cron, but they do not load from object storage, chain to other tools, retry one step, or reprocess only the partitions that changed. Kestra runs above BigQuery: it loads the moment a file lands in GCS, reacts to Pub/Sub in real time, chains BigQuery with Airbyte ingestion and dbt transforms, and refreshes only the changed date range so cost tracks the delta. A transient transform retry never re-runs the load.

Run dbt transformations on BigQuery from Git Open blueprint
Refresh only the partitions that changed Open blueprint
Micro-batch Pub/Sub events into BigQuery Open blueprint

Above BigQuery Scheduled Queries.

BigQuery runs the SQL, the storage, and the slots. Kestra runs the steps around the query: what triggers it, what feeds it, how much it scans, and where the cross-tool audit trail lives.

Event-driven loads, not query cron

Scheduled Queries fire on a fixed cadence whether the source is ready or not. Kestra triggers a BigQuery job the moment the upstream step confirms: io.kestra.plugin.gcp.gcs.Trigger on a new object, pubsub.RealtimeTrigger per message, or bigquery.Trigger when a sentinel query returns rows. The load runs on the event, not the next tick.

Load, query, and extract orchestrated

Kestra runs bigquery.LoadFromGcs to load objects, bigquery.Query to run SQL jobs, and bigquery.ExtractToGcs to export, as ordered steps with per-step retries. A failed query retries without re-loading the data, and outputs flow forward to whatever reads the table next.

Partition-aware incremental refreshes

Reprocessing a whole table to update one day is slow and expensive. Kestra targets only the changed range: bigquery.DeletePartitions clears the affected dates, then bigquery.LoadFromGcs reloads just those partitions. Cost and runtime track the delta, and clearing before loading makes a rerun safe to repeat.

Full data stack around the warehouse

BigQuery has no view of the Airbyte sync that fed it or the dbt models downstream. Kestra runs ingestion, the load, the transform, and the Hightouch activation as steps in one flow with job IDs flowing forward and one shared execution ID.

Cost guardrails before a query runs

BigQuery bills by bytes scanned. A flow runs a dry-run query first, branches on the estimate with an io.kestra.plugin.core.flow.If task, and only runs the real job under a threshold. Pairing the dry-run guardrail with partition-aware loads keeps both bytes scanned and runtime tied to the actual delta.

Self-service queries for analysts

Analysts should not need console access to run a parameterized export. Kestra Apps wraps a Query flow in a typed form: pick the date range, run the query, get the result file. Every run lands in execution history with the requesting user, so self-service is still audited.

How teams use BigQuery and Kestra

Patterns cloud data teams run in production today. Each one shows the flow end to end, with the real plugin classes in play.

Event-driven

Load BigQuery when a file lands in GCS

The GCS Trigger fires on a new object, Kestra runs LoadFromGcs into BigQuery, then triggers dbt against the loaded table and confirms on Slack. The load runs on arrival, not on a clock.

Trigger on arrival

gcs.Trigger fires on CREATE or UPDATE under a prefix.

Load then transform

dbt runs only after LoadFromGcs confirms the table is loaded.

Move processed files

The trigger can MOVE or DELETE objects so they do not re-fire.

gcs trigger
object lands
bigquery load
LoadFromGcs
dbt build
transform
slack
on complete
Streaming

Stream Pub/Sub events into BigQuery

The pubsub.RealtimeTrigger starts one execution per message. Kestra routes by payload, runs a bigquery.Query insert, and acks the message. Use the batch pubsub.Trigger for grouped micro-batch inserts instead.

One execution per message

RealtimeTrigger fans each Pub/Sub message into its own run.

Route by payload

A Switch sends each message type down its own path.

Batch mode available

Swap to pubsub.Trigger for grouped micro-batch inserts.

pubsub trigger
per message
route
by type
bigquery insert
write row
notify
on event
Full stack

Ingest, load, transform, activate around BigQuery

Airbyte syncs into GCS, Kestra loads BigQuery, builds dbt models, then fires the Hightouch activation. One flow, one execution ID, step-local retries across every service.

One execution ID across services

GCS object, BigQuery job ID, and dbt results in one view.

Step retries stay local

A dbt retry never re-runs the load or the Airbyte sync.

Outputs flow forward

The load's destination table feeds the dbt and activation steps.

airbyte sync
ingest to GCS
bigquery load
LoadFromGcs
dbt build
transform
hightouch sync
activate
Incremental

Refresh only the partitions that changed

A daily load should not rescan the whole table. Kestra runs DeletePartitions for the affected date range, then LoadFromGcs reloads just those partitions. Cost and runtime track the delta, not the table size.

Touch only the delta

DeletePartitions plus LoadFromGcs reprocess just the changed dates.

Cost tracks the change

Runtime and bytes scanned scale with the delta, not the table.

Idempotent reloads

Clearing then loading the range makes a rerun safe to repeat.

schedule
daily window
delete partitions
clear range
load partitions
reload range
notify
on complete
Big data

Extract to GCS and run a Dataproc Spark batch

Kestra exports a BigQuery table to GCS with ExtractToGcs, submits a Spark batch to Dataproc Serverless, then loads the result back into BigQuery. The Spark job is one step in the flow, not a standalone island.

Spark as a step

PySparkSubmit runs on Dataproc Serverless inside the flow.

Extract and reload

BigQuery to GCS to Spark to BigQuery, all ordered.

One history for the batch

The extract, Spark job, and reload share one execution ID.

bigquery extract
to GCS
dataproc spark
PySpark batch
bigquery load
result back
notify
on complete
Kestra is the unifying layer for our data and workflows. You can start small, but then there is no limit to the possibilities and scalability of such an open architecture.
Julien Henrion, Head of Data Engineering at Leroy Merlin France
+900%Increase in data production
5000+Workflows created

Kestra vs the orchestration alternatives BigQuery teams evaluate

Capability
Scheduled Queries
Cloud Composer
Dagster
Load from GCS, trigger on arrival
gcs.Trigger + LoadFromGcs
SQL only, no loadsSensors, Python requiredCustom Python code
Chain BigQuery with dbt and Airbyte
Native plugins
SQL onlyOperators, Python DAGsOps, Python required
Partition-aware incremental loads
DeletePartitions + LoadFromGcs
Manual SQL DMLCustom operator logicCustom Python code
Dry-run cost check before a query
Branch on scanned bytes
NoCustom operatorCustom code
Run fully self-hosted, off GCP
Docker, Kubernetes, air-gapped
GCP onlyManaged GCP onlySelf-host or cloud
Self-service queries for analysts
Kestra Apps
NoNo native form layerNo native form layer
Declarative YAML, IaC-friendly
YAML + Terraform provider
SQL + consolePython DAGsPython assets

BigQuery & Kestra: common questions

Find answers to your questions right here, and don't hesitate to Contact Us if you couldn't find what you're looking for.

See How

Ready to orchestrate your BigQuery jobs?

Trigger on GCS and Pub/Sub events, refresh only the changed partitions, chain with Airbyte and dbt, and guard query cost. Open source, self-hosted, event-driven.