List objects in an S3 bucket and process them in parallel

Source

yaml

id: s3-map-over-objects
namespace: company.team

inputs:
  - id: bucket
    type: STRING
    defaults: declarative-data-orchestration

tasks:
  - id: list_objects
    type: io.kestra.plugin.aws.s3.List
    bucket: "{{ inputs.bucket }}"
    prefix: powerplant/
    accessKeyId: "{{ secret('AWS_ACCESS_KEY_ID') }}"
    secretKeyId: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
    region: "{{ secret('AWS_DEFAULT_REGION') }}"

  - id: print_objects
    type: io.kestra.plugin.core.log.Log
    message: "Found objects {{ outputs.list_objects.objects }}"

  - id: map_over_s3_objects
    type: io.kestra.plugin.core.flow.ForEach
    concurrencyLimit: 0
    values: "{{ outputs.list_objects.objects }}"
    tasks:
      - id: filename
        type: io.kestra.plugin.core.log.Log
        message: "Filename {{ json(taskrun.value).key }} with size {{
          json(taskrun.value).size }}"

About this blueprint

AWS Kestra

This flow lists objects with a specific prefix in an S3 bucket and then processes each object in parallel. This flow assumes AWS credentials stored as secrets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION.

List

Log

For Each

More Related Blueprints

AWSKestra

Extract a CSV file via HTTP API and upload it to S3 by using scheduled date as a filename

NotificationsPythonAWSKestra

Upload data to S3 in Python using boto3, transform it in a SQL query with DuckDB and send a CSV report via email every first day of the month

PythonCLIGitAWSKestra

Ingest data to AWS S3 with Git, Python, Apache Iceberg, AWS Glue and Amazon Athena

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra