Blueprints
Schedule a Python data ingestion job to extract data from an API and load it to DuckDB using dltHub (data load tool)
Source
yaml
id: dlt-elt-chess-api-to-duckdb
namespace: company.team
tasks:
- id: chess_api_to_duckdb
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: python:slim
beforeCommands:
- pip install dlt[duckdb]
warningOnStdErr: false
script: |
import dlt
import requests
pipeline = dlt.pipeline(
pipeline_name='chess_pipeline',
destination='duckdb',
dataset_name='player_data'
)
data = []
for player in ['magnuscarlsen', 'rpragchess']:
response = requests.get(f'https://api.chess.com/pub/player/{player}')
response.raise_for_status()
data.append(response.json())
# Extract, normalize, and load the data
pipeline.run(data, table_name='player')
triggers:
- id: daily
type: io.kestra.plugin.core.trigger.Schedule
disabled: true
cron: 0 9 * * *
About this blueprint
Trigger Ingest Data Schedule
This flow loads data from the Chess.com API into DuckDB destination. The flow is scheduled to run daily at 9 AM.