Source
id: data-pipeline-assets
namespace: kestra.company
tasks:
- id: create_staging_layer_asset
type: io.kestra.plugin.jdbc.duckdb.Query
sql: |
CREATE TABLE IF NOT EXISTS trips AS
select VendorID, passenger_count, trip_distance from sample_data.nyc.taxi limit 10;
assets:
inputs:
- id: sample_data.nyc.taxi
outputs:
- id: trips
namespace: "{{flow.namespace}}"
type: io.kestra.plugin.ee.assets.Table
metadata:
model_layer: staging
- id: for_each
type: io.kestra.plugin.core.flow.ForEach
values:
- passenger_count
- trip_distance
tasks:
- id: create_mart_layer_asset
type: io.kestra.plugin.jdbc.duckdb.Query
sql: SELECT AVG({{taskrun.value}}) AS avg_{{taskrun.value}} FROM trips;
assets:
inputs:
- id: trips
outputs:
- id: avg_{{taskrun.value}}
type: io.kestra.plugin.ee.assets.Table
namespace: "{{flow.namespace}}"
metadata:
model_layer: mart
pluginDefaults:
- type: io.kestra.plugin.jdbc.duckdb
values:
url: "jdbc:duckdb:md:my_db?motherduck_token={{ secret('MOTHERDUCK_TOKEN') }}"
fetchType: STORE
About this blueprint
Core Data
This blueprint showcases how to build analytics-ready data assets in Kestra using DuckDB while automatically capturing data lineage, dependencies, and model layers.
It demonstrates how Kestra Assets turn a traditional SQL-based data pipeline into a fully governed data workflow, making tables discoverable, traceable, and reusable across teams.
What this pipeline does:
- Register an external data asset
The flow declaressample_data.nyc.taxias an input asset, representing an external source table. This allows Kestra to track lineage even when data originates outside the platform. - Create a staging-layer asset
Atripstable is materialized in DuckDB and registered as aTableasset withmodel_layer: staging.
This follows modern analytics engineering practices (staging → marts) and makes the table visible in Kestra’s asset catalog. - Generate mart-layer assets dynamically
Using aForEachloop, the flow computes aggregated metrics (avg_passenger_count,avg_trip_distance) and registers each result as a mart-layer table asset.
Each asset is explicitly linked to its upstream dependencies. - Automatic data lineage and dependency graph
Kestra automatically builds a complete lineage graph showing:- external source → staging table → mart tables
This enables impact analysis, auditing, and clear ownership of analytics outputs.
- external source → staging table → mart tables
Why this matters:
- Enforces data modeling best practices (staging & mart layers)
- Provides end-to-end data lineage without manual documentation
- Creates first-class analytics assets that can be reused by downstream pipelines, dashboards, or teams
- Bridges the gap between orchestration and data governance
This pattern is ideal for teams building modern analytics platforms, DuckDB-based data warehouses, or MotherDuck-powered analytical stacks that require transparency, observability, and trust in data pipelines.
More Related Blueprints