Blueprints

Data Ingestion Pipeline, Sync Hacker News Data to CSV Files Using CloudQuery

Source

yaml
id: cloudquery-sync-hn-to-parquet
namespace: company.team

tasks:
  - id: hn_to_parquet
    type: io.kestra.plugin.cloudquery.CloudQueryCLI
    inputFiles:
      config.yml: |
        kind: source
        spec:
          name: hackernews
          path: cloudquery/hackernews
          version: v3.7.7
          tables: ["*"]
          destinations:
            - file
          spec:
            item_concurrency: 100
            start_time: "{{ execution.startDate | dateAdd(-1, 'DAYS') }}"
        ---
        kind: destination
        spec:
          name: file
          path: cloudquery/file
          version: v5.4.6
          spec:
            path: "{% raw %}{{TABLE}}/{{UUID}}.{{FORMAT}}{% endraw %}"
            format: csv
    outputFiles:
      - "**/*.csv"
    env:
      CLOUDQUERY_API_KEY: 9ITIyNYb8s3Cr8nSiV4KcKVPGJNSd6u8
    commands:
      - cloudquery sync config.yml --log-console --log-level=warn

About this blueprint

Infrastructure

This blueprint demonstrates how to build a data ingestion pipeline that extracts data from Hacker News using CloudQuery and stores it locally as structured CSV files for analytics, exploration, or downstream processing.

It shows how to:

  1. Use CloudQuery as a data extraction engine to sync data from the Hacker News API.
  2. Configure CloudQuery sources and destinations using inline YAML configuration files.
  3. Export all available Hacker News tables and persist them as CSV files on disk.
  4. Control ingestion parameters such as concurrency and incremental start time to efficiently sync recent data.
  5. Orchestrate CloudQuery ingestion jobs using Kestra for repeatable, automated data pipelines.

This pattern can be easily extended to ingest data from any CloudQuery supported source and load it into any supported destination, including cloud storage, data warehouses, or analytics-ready file formats.

To avoid parsing CloudQuery TABLE and UUID variables as Kestra Pebble expressions, this flow uses the {% raw %}{{TABLE}}{% endraw %} syntax.

Make sure to replace the API key with your own CloudQuery API key as shown in this blog post.

Cloud Query CLI

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra