
RSparkSubmit
For more details, check out the Apache SparkR documentation.
For more details, check out the Apache SparkR documentation.
Submit an Apache SparkR batch workload to a Google Cloud Dataproc cluster.
For more details, check out the Apache SparkR documentation.
type: "io.kestra.plugin.gcp.dataproc.batches.RSparkSubmit"Examples
id: gcp_dataproc_r_spark_submit
namespace: company.team
tasks:
- id: r_spark_submit
type: io.kestra.plugin.gcp.dataproc.batches.RSparkSubmit
mainRFileUri: 'gs://spark-jobs-kestra/dataframe.r'
name: test-rspark
region: europe-west3
Properties
mainRFileUri*Requiredstring
The HCFS URI of the main R file to use as the driver. Must be a .R or .r file.
name*Requiredstring
The batch name
region*Requiredstring
The region
archiveUrisarray
HCFS URIs of archives to be extracted into the working director of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
argsarray
The arguments to pass to the driver.
Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.
execution
Execution configuration for a workload.
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-ExecutionConfiguration
The Cloud KMS key to use for encryption.
Tags used for network traffic control.
Network URI to connect workload to.
Service account used to execute workload.
Subnetwork URI to connect workload to.
fileUrisarray
HCFS URIs of files to be placed in the working directory of each executor.
impersonatedServiceAccountstring
The GCP service account to impersonate.
jarFileUrisarray
HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
peripherals
Peripherals configuration for a workload.
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-PeripheralsConfiguration
Resource name of an existing Dataproc Metastore service.
Example: projects/[project_id]/locations/[region]/services/[service_id]
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-SparkHistoryServerConfiguration
Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.
Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]
projectIdstring
The GCP project ID.
runtime
Runtime configuration for a workload.
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-RuntimeConfiguration
Optional custom container image for the job runtime environment.
If not specified, a default container image will be used.
properties used to configure the workload execution (map of key/value pairs).
Version of the batch runtime.
scopesarray
["https://www.googleapis.com/auth/cloud-platform"]The GCP scopes to be used.
serviceAccountstring
The GCP service account.
Outputs
statestring
STATE_UNSPECIFIEDPENDINGRUNNINGCANCELLINGCANCELLEDSUCCEEDEDFAILEDUNRECOGNIZEDThe state of the batch.