Blueprints

List objects in an S3 bucket and process them in parallel

Source

yaml
id: s3-map-over-objects
namespace: company.team

inputs:
  - id: bucket
    type: STRING
    defaults: declarative-data-orchestration

tasks:
  - id: list_objects
    type: io.kestra.plugin.aws.s3.List
    bucket: "{{ inputs.bucket }}"
    prefix: powerplant/
    accessKeyId: "{{ secret('AWS_ACCESS_KEY_ID') }}"
    secretKeyId: "{{ secret('AWS_SECRET_ACCESS_KEY') }}"
    region: "{{ secret('AWS_DEFAULT_REGION') }}"

  - id: print_objects
    type: io.kestra.plugin.core.log.Log
    message: "Found objects {{ outputs.list_objects.objects }}"

  - id: map_over_s3_objects
    type: io.kestra.plugin.core.flow.ForEach
    concurrencyLimit: 0
    values: "{{ outputs.list_objects.objects }}"
    tasks:
      - id: filename
        type: io.kestra.plugin.core.log.Log
        message: "Filename {{ json(taskrun.value).key }} with size {{
          json(taskrun.value).size }}"

About this blueprint

S3 Parallel Variables Inputs Outputs

This flow lists objects with a specific prefix in an S3 bucket and then processes each object in parallel. This flow assumes AWS credentials stored as secrets AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION.

List

Log

For Each

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra