RSparkSubmitRSparkSubmit
​R​Spark​SubmitCertified

For more details, check out the Apache SparkR documentation.

Submit an Apache SparkR batch workload to a Google Cloud Dataproc cluster.

For more details, check out the Apache SparkR documentation.

yaml
type: "io.kestra.plugin.gcp.dataproc.batches.RSparkSubmit"
yaml
id: gcp_dataproc_r_spark_submit
namespace: company.team
tasks:
  - id: r_spark_submit
    type: io.kestra.plugin.gcp.dataproc.batches.RSparkSubmit
    mainRFileUri: 'gs://spark-jobs-kestra/dataframe.r'
    name: test-rspark
    region: europe-west3
Properties

The HCFS URI of the main R file to use as the driver. Must be a .R or .r file.

The batch name

The region

SubTypestring

HCFS URIs of archives to be extracted into the working director of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

SubTypestring

The arguments to pass to the driver.

Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.

Execution configuration for a workload.

Definitions
kmsKeystring

The Cloud KMS key to use for encryption.

networkTagsarray
SubTypestring

Tags used for network traffic control.

networkUristring

Network URI to connect workload to.

serviceAccountEmailstring

Service account used to execute workload.

subnetworkUristring

Subnetwork URI to connect workload to.

SubTypestring

HCFS URIs of files to be placed in the working directory of each executor.

The GCP service account to impersonate.

SubTypestring

HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

Peripherals configuration for a workload.

Definitions
metastoreServicestring
sparkHistoryServer

Resource name of an existing Dataproc Metastore service.

Example: projects/[project_id]/locations/[region]/services/[service_id]

dataprocClusterstring

Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.

Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]

The GCP project ID.

Runtime configuration for a workload.

Definitions
containerImagestring

Optional custom container image for the job runtime environment.

If not specified, a default container image will be used.

propertiesobject
SubTypestring

properties used to configure the workload execution (map of key/value pairs).

versionstring

Version of the batch runtime.

SubTypestring
Default["https://www.googleapis.com/auth/cloud-platform"]

The GCP scopes to be used.

The GCP service account.

Possible Values
STATE_UNSPECIFIEDPENDINGRUNNINGCANCELLINGCANCELLEDSUCCEEDEDFAILEDUNRECOGNIZED

The state of the batch.