Source
id: cloudquery-sync
namespace: company.team
tasks:
- id: hn_to_duckdb
type: io.kestra.plugin.cloudquery.Sync
env:
CLOUDQUERY_API_KEY: "9ITIyNYb8s3Cr8nSiV4KcKVPGJNSd6u8"
incremental: false
configs:
- kind: source
spec:
name: hackernews
path: cloudquery/hackernews
version: v3.0.13
tables:
- "*"
destinations:
- duckdb
spec:
item_concurrency: 100
start_time: "{{ trigger.date ?? execution.startDate | dateAdd(-1, 'DAYS') }}"
- kind: destination
spec:
name: duckdb
path: cloudquery/duckdb
version: v4.2.10
write_mode: overwrite-delete-stale
spec:
connection_string: hn.db
triggers:
- id: schedule
type: io.kestra.plugin.core.trigger.Schedule
cron: "@daily"
timezone: US/Eastern
About this blueprint
Data
This blueprint demonstrates how to schedule a CloudQuery data ingestion pipeline that syncs external data sources into DuckDB for analytics, exploration, and downstream processing.
In this example, the automation performs a daily batch sync that:
- Extracts data from the Hacker News source using CloudQuery.
- Loads the data into DuckDB using the CloudQuery DuckDB destination plugin.
- Supports high-throughput ingestion with configurable concurrency.
- Runs on a fixed schedule using a cron trigger.
The flow dynamically sets the sync start time to one day before the scheduled execution, enabling controlled backfills while avoiding large historical reprocessing. This makes it easy to ingest recent data in small, predictable batches.
Alternatively, you can enable incremental ingestion by setting the
incremental flag to true. In incremental mode, the sync cursor is stored
automatically and reused on subsequent runs, ensuring that only new data is
fetched and loaded.
This pattern is ideal for:
- Scheduled batch ingestion pipelines
- Analytics workflows using DuckDB
- Lightweight data warehousing
- Backfilling time-based datasets
- Replacing custom ingestion scripts with declarative configs
To configure additional CloudQuery sources or destinations, visit the CloudQuery Integrations page. You can copy the YAML configuration for supported plugins and adapt it to your own ingestion pipelines. The documentation also provides a full list of available tables and configuration options.
Premium CloudQuery plugins require an API key, which can be generated here: https://docs.cloudquery.io/docs/deployment/generate-api-key
The API key should be stored securely and passed as an environment variable, for example:
- id: hn_to_duckdb
type: io.kestra.plugin.cloudquery.Sync
env:
CLOUDQUERY_API_KEY: "{{ secret('CLOUDQUERY_API_KEY') }}"
More Related Blueprints