Detect New Files in S3 and process them in Python

Source

yaml

id: s3-trigger-python
namespace: company.team

variables:
  bucket: s3-bucket
  region: eu-west-2

tasks:
  - id: process_data
    type: io.kestra.plugin.scripts.python.Commands
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/kestrapy:latest
    namespaceFiles:
      enabled: true
    inputFiles:
      input.csv: "{{ read(trigger.objects[0].uri) }}"
    outputFiles:
      - data.csv
    commands:
      - python process_data.py

triggers:
  - id: watch
    type: io.kestra.plugin.aws.s3.Trigger
    interval: "PT1S"
    accessKeyId: "{{ secret('AWS_ACCESS_KEY_ID') }}"
    secretKeyId: "{{ secret('AWS_SECRET_KEY_ID') }}"
    region: "{{ vars.region }}"
    bucket: "{{ vars.bucket }}"
    filter: FILES
    action: MOVE
    moveTo:
      key: archive/
    maxKeys: 1

About this blueprint

Python AWS

This flow will be triggered whenever a new file is detected in the specified S3 bucket. It will download the file into the internal storage and move the S3 object to an archive folder (i.e. S3 object prefix with the name archive).

The Python code will read the file as an inputFile called input.csv and processing it to generate a new file called data.csv.

It's recommended to set the accessKeyId and secretKeyId properties as secrets.

This flow assumes AWS credentials stored as secrets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY.

Commands

Docker

Trigger

More Related Blueprints

PythonCLIDevOpsGitAWS

Run multiple Python scripts in parallel on AWS ECS Fargate with AWS Batch

PythonAWS

Event-driven data ingestion to AWS S3 data lake managed by Apache Iceberg, AWS Glue and Amazon Athena

NotificationsPythonAWSKestra

Upload data to S3 in Python using boto3, transform it in a SQL query with DuckDB and send a CSV report via email every first day of the month

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra