Blueprints

Transform data from CSV files with Pandas in Python containers (in parallel)

Source

yaml
id: python-csv-each-parallel
namespace: company.team

tasks:
  - id: csv
    type: io.kestra.plugin.core.flow.ForEach
    concurrencyLimit: 0
    values:
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/products.csv
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/salaries.csv
    tasks:
      - id: pandas
        type: io.kestra.plugin.scripts.python.Script
        warningOnStdErr: false
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/pydata:latest
        script: |
          import pandas as pd
          df = pd.read_csv("{{ taskrun.value }}")
          df.info()

About this blueprint

Parallel Python

This flow reads a list of CSV files and processes each file in parallel in isolated Python scripts using Pandas.

For Each

Script

Docker

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra