Blueprints

Build governed analytics assets with DuckDB using data lineage, staging, and mart layers

Source

yaml
id: data-pipeline-assets
namespace: kestra.company

tasks:
  - id: create_staging_layer_asset
    type: io.kestra.plugin.jdbc.duckdb.Query
    sql: |
      CREATE TABLE IF NOT EXISTS trips AS
      select VendorID, passenger_count, trip_distance from sample_data.nyc.taxi limit 10;
    assets:
      inputs:
        - id: sample_data.nyc.taxi
      outputs:
        - id: trips
          namespace: "{{flow.namespace}}"
          type: io.kestra.plugin.ee.assets.Table
          metadata:
            model_layer: staging

  - id: for_each
    type: io.kestra.plugin.core.flow.ForEach
    values:
      - passenger_count
      - trip_distance
    tasks:
      - id: create_mart_layer_asset
        type: io.kestra.plugin.jdbc.duckdb.Query
        sql: SELECT AVG({{taskrun.value}}) AS avg_{{taskrun.value}} FROM trips;
        assets:
          inputs:
            - id: trips
          outputs:
            - id: avg_{{taskrun.value}}
              type: io.kestra.plugin.ee.assets.Table
              namespace: "{{flow.namespace}}"
              metadata:
                model_layer: mart

pluginDefaults:
  - type: io.kestra.plugin.jdbc.duckdb
    values:
      url: "jdbc:duckdb:md:my_db?motherduck_token={{ secret('MOTHERDUCK_TOKEN') }}"
      fetchType: STORE

About this blueprint

Core Data

This blueprint showcases how to build analytics-ready data assets in Kestra using DuckDB while automatically capturing data lineage, dependencies, and model layers.

It demonstrates how Kestra Assets turn a traditional SQL-based data pipeline into a fully governed data workflow, making tables discoverable, traceable, and reusable across teams.

What this pipeline does:

  • Register an external data asset
    The flow declares sample_data.nyc.taxi as an input asset, representing an external source table. This allows Kestra to track lineage even when data originates outside the platform.
  • Create a staging-layer asset
    A trips table is materialized in DuckDB and registered as a Table asset with model_layer: staging.
    This follows modern analytics engineering practices (staging → marts) and makes the table visible in Kestra’s asset catalog.
  • Generate mart-layer assets dynamically
    Using a ForEach loop, the flow computes aggregated metrics (avg_passenger_count, avg_trip_distance) and registers each result as a mart-layer table asset.
    Each asset is explicitly linked to its upstream dependencies.
  • Automatic data lineage and dependency graph
    Kestra automatically builds a complete lineage graph showing:
    • external source → staging table → mart tables
      This enables impact analysis, auditing, and clear ownership of analytics outputs.

Why this matters:

  • Enforces data modeling best practices (staging & mart layers)
  • Provides end-to-end data lineage without manual documentation
  • Creates first-class analytics assets that can be reused by downstream pipelines, dashboards, or teams
  • Bridges the gap between orchestration and data governance

This pattern is ideal for teams building modern analytics platforms, DuckDB-based data warehouses, or MotherDuck-powered analytical stacks that require transparency, observability, and trust in data pipelines.

Query

Table

For Each

duckdb

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra