Source
id: wdir-pandas-python-outputs
namespace: company.team
tasks:
- id: etl
type: io.kestra.plugin.core.flow.WorkingDirectory
tasks:
- id: extract_csv
type: io.kestra.plugin.scripts.python.Script
warningOnStdErr: false
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
data = {
'Column1': ['A', 'B', 'C', 'D'],
'Column2': [1, 2, 3, 4],
'Column3': [5, 6, 7, 8]
}
df = pd.DataFrame(data)
print(df.head())
df.to_csv("raw_data.csv", index=False)
- id: transform_and_load_csv
type: io.kestra.plugin.scripts.python.Script
warningOnStdErr: false
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
outputFiles:
- final.csv
script: |
import pandas as pd
df = pd.read_csv("raw_data.csv")
df['Column4'] = df['Column2'] + df['Column3']
print(df.head())
df.to_csv("final.csv", index=False)
About this blueprint
Python Data Outputs
This flow demonstrates how to use the WorkingDirectory
task to persist data between multiple Python script tasks running in separate containers.
The first task stores the data as a CSV file called "raw_data.csv". The second task loads the CSV file, transforms it and outputs the final CSV file to Kestra's internal storage. You can download that file from the Outputs tab on the Execution's page.
Kestra's internal storage allows you to use the output in other tasks in the flow, even if those tasks are processed in different containers. The final result file will also be available as a downloadable artifact, making it easier to share the results of your data workflow with various business stakeholders.