Transform data from CSV files with Pandas in Python containers (in parallel)

Source

yaml

id: python-csv-each-parallel
namespace: company.team

tasks:
  - id: csv
    type: io.kestra.plugin.core.flow.ForEach
    concurrencyLimit: 0
    values:
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/products.csv
      - https://huggingface.co/datasets/kestra/datasets/raw/main/csv/salaries.csv
    tasks:
      - id: pandas
        type: io.kestra.plugin.scripts.python.Script
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/pydata:latest
        script: |
          import pandas as pd
          df = pd.read_csv("{{ taskrun.value }}")
          df.info()

About this blueprint

Python Kestra

This flow reads a list of CSV files and processes each file in parallel in isolated Python scripts using Pandas.

For Each

Script

Docker

More Related Blueprints

PythonKestra

Run specific tasks only on business days for a specific country

PythonKestra

Add a parametrized Python script as a Namespace File and run it in parallel in Docker containers

PythonKestra

Run a Python script and generate outputs, metrics and files specified with a variable

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra