Blueprints

Pandas ETL - passing data between Python script tasks running in separate containers

Source

yaml
id: wdir-pandas-python-outputs
namespace: company.team

tasks:
  - id: etl
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: extract_csv
        type: io.kestra.plugin.scripts.python.Script
        warningOnStdErr: false
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/pydata:latest
        script: |
          import pandas as pd
          data = {
              'Column1': ['A', 'B', 'C', 'D'],
              'Column2': [1, 2, 3, 4],
              'Column3': [5, 6, 7, 8]
          }
          df = pd.DataFrame(data)
          print(df.head())
          df.to_csv("raw_data.csv", index=False)

      - id: transform_and_load_csv
        type: io.kestra.plugin.scripts.python.Script
        warningOnStdErr: false
        taskRunner:
          type: io.kestra.plugin.scripts.runner.docker.Docker
        containerImage: ghcr.io/kestra-io/pydata:latest
        outputFiles:
          - final.csv
        script: |
          import pandas as pd
          df = pd.read_csv("raw_data.csv")
          df['Column4'] = df['Column2'] + df['Column3']
          print(df.head())
          df.to_csv("final.csv", index=False)

About this blueprint

Python Data Outputs

This flow demonstrates how to use the WorkingDirectory task to persist data between multiple Python script tasks running in separate containers. The first task stores the data as a CSV file called "raw_data.csv". The second task loads the CSV file, transforms it and outputs the final CSV file to Kestra's internal storage. You can download that file from the Outputs tab on the Execution's page. Kestra's internal storage allows you to use the output in other tasks in the flow, even if those tasks are processed in different containers. The final result file will also be available as a downloadable artifact, making it easier to share the results of your data workflow with various business stakeholders.

Working Directory

Script

Docker

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra