Blueprints

Build an ETL Pipeline with Airbyte Cloud and dbt Core for SaaS Analytics

Source

yaml
id: airbyte-cloud-dbt
namespace: company.team

tasks:
  - id: data_ingestion
    type: io.kestra.plugin.core.flow.Parallel
    tasks:
      - id: salesforce
        type: io.kestra.plugin.airbyte.cloud.jobs.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ab

      - id: google_analytics
        type: io.kestra.plugin.airbyte.cloud.jobs.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12cd

      - id: facebook_ads
        type: io.kestra.plugin.airbyte.cloud.jobs.Sync
        connectionId: e3b1ce92-547c-436f-b1e8-23b6936c12ef

  - id: dbt
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-demo
        branch: main

      - id: dbt_build
        type: io.kestra.plugin.dbt.cli.Build
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        dbtPath: /usr/local/bin/dbt
        dockerOptions:
          image: ghcr.io/kestra-io/dbt-bigquery:latest
        inputFiles:
          .profile/profiles.yml: |
            jaffle_shop:
              outputs:
                dev:
                  type: bigquery
                  dataset: your_big_query_dataset_name
                  project: your_big_query_project
                  fixed_retries: 1
                  keyfile: sa.json
                  location: EU
                  method: service-account
                  priority: interactive
                  threads: 8
                  timeout_seconds: 300
              target: dev
          sa.json: "{{ secret('GCP_CREDS') }}"

pluginDefaults:
  - type: io.kestra.plugin.airbyte.cloud.jobs.Sync
    values:
      token: "{{ secret('AIRBYTE_CLOUD_API_TOKEN') }}"

About this blueprint

Data

This blueprint orchestrates a modern ETL pipeline by combining parallel SaaS data ingestion with local dbt Core transformations, using Airbyte Cloud for extraction and dbt CLI for analytics modeling.

It performs the following actions:

  • Runs multiple Airbyte Cloud syncs in parallel to ingest data from SaaS sources such as Salesforce, Google Analytics, and advertising platforms.
  • Waits for all ingestion jobs to complete before starting transformations.
  • Clones a dbt project repository and executes dbt Core commands using a containerized runtime.
  • Transforms raw ingested data into analytics-ready tables in a cloud data warehouse.

This pattern is designed for analytics engineering teams and data platform teams that want full control over dbt execution while still leveraging managed SaaS ingestion with Airbyte Cloud.

Configuration:

  • Add an Airbyte Cloud API token as a secret (AIRBYTE_CLOUD_API_TOKEN).
  • Configure cloud warehouse credentials (for example, a GCP service account for BigQuery) as secrets.
  • Update Airbyte connectionId values to match your Airbyte Cloud workspace.
  • Customize the dbt project repository, profiles, dataset, and execution settings to match your analytics environment.
  • Optionally schedule or trigger this ETL pipeline on demand to support dashboards, reporting, or downstream data products.

By orchestrating ingestion and transformation in a single automation, this blueprint ensures fresh, consistent, and analytics-ready data for modern BI and analytics workloads.

Parallel

Sync

Working Directory

Clone

Build

Docker

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra