SubmitSteps​Submit​Steps

Add steps to an existing AWS EMR cluster.

yaml
type: "io.kestra.plugin.aws.emr.SubmitSteps"

Add a job step to an existing AWS EMR cluster

yaml
id: aws_emr_add_emr_job_steps
namespace: company.team

tasks:
  - id: add_steps_emr
    type: io.kestra.plugin.aws.emr.SubmitSteps
    accessKeyId: "{{ secret('AWS_ACCESS_KEY_ID') }}"
    secretKeyId: "{{ secret('AWS_SECRET_KEY_ID') }}"
    region: "eu-west-3"
    clusterId: j-XXXXXXXXXXXX
    steps:
        - name: Spark_job_test
          jar: "command-runner.jar"
          actionOnFailure: CONTINUE
          commands:
            - spark-submit s3://mybucket/health_violations.py --data_source s3://mybucket/food_establishment_data.csv --output_uri s3://mybucket/test-emr-output
Properties

Cluster ID.

SubType

Steps

List of steps to add to the existing cluster.

Access Key Id in order to connect to AWS.

If no credentials are defined, we will use the default credentials provider chain to fetch credentials.

Enable compatibility mode.

Use it to connect to S3 bucket with S3 compatible services that don't support the new transport client.

The endpoint with which the SDK should communicate.

This property allows you to use a different S3 compatible storage backend.

Execution role ARN.

The Amazon Resource Name (ARN) of the runtime role for a step on the cluster. The runtime role can be a cross-account IAM role. The runtime role ARN is a combination of account ID, role name, and role type using the following format: arn: partition: service: region: account: resource.

Force path style access.

Must only be used when compatibilityMode is enabled.

AWS region with which the SDK should communicate.

Secret Key Id in order to connect to AWS.

If no credentials are defined, we will use the default credentials provider chain to fetch credentials.

AWS session token, retrieved from an AWS token service, used for authenticating that this user has received temporary permissions to access a given resource.

If no credentials are defined, we will use the default credentials provider chain to fetch credentials.

The AWS STS endpoint with which the SDKClient should communicate.

AWS STS Role.

The Amazon Resource Name (ARN) of the role to assume. If set the task will use the StsAssumeRoleCredentialsProvider. If no credentials are defined, we will use the default credentials provider chain to fetch credentials.

AWS STS External Id.

A unique identifier that might be required when you assume a role in another account. This property is only used when an stsRoleArn is defined.

Default PT15M
Format duration

AWS STS Session duration.

The duration of the role session (default: 15 minutes, i.e., PT15M). This property is only used when an stsRoleArn is defined.

AWS STS Session name.

This property is only used when an stsRoleArn is defined.

Possible Values
TERMINATE_CLUSTERCANCEL_AND_WAITCONTINUETERMINATE_JOB_FLOW

Action on failure.

Possible values : TERMINATE_CLUSTER, CANCEL_AND_WAIT, CONTINUE, TERMINATE_JOB_FLOW.

JAR path.

A path to a JAR file run during the step.

Step configuration name.

Ex: "Run Spark job"

SubType string

Commands.

A list of commands that will be passed to the JAR file's main function when executed.

Main class.

The name of the main class in the specified Java file. If not specified, the JAR file should specify a Main-Class in its manifest file.