SparkSubmit
Submit an Apache Spark batch workload to a Google Cloud Dataproc cluster.
For more details, check out the Apache Spark documentation.
type: "io.kestra.plugin.gcp.dataproc.batches.SparkSubmit"
Examples
id: gcp_dataproc_spark_submit
namespace: company.team
tasks:
- id: spark_submit
type: io.kestra.plugin.gcp.dataproc.batches.SparkSubmit
jarFileUris:
- 'gs://spark-jobs-kestra/spark-examples.jar'
mainClass: org.apache.spark.examples.SparkPi
args:
- 1000
name: test-spark
region: europe-west3
Properties
mainClass *Requiredstring
The name of the driver main class.
The jar file that contains the class must be in the classpath or specified in jarFileUris
name *Requiredstring
The batch name
region *Requiredstring
The region
archiveUris array
HCFS URIs of archives to be extracted into the working director of each executor. Supported file types: .jar
, .tar
, .tar.gz
, .tgz
, and .zip
.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
args array
The arguments to pass to the driver.
Do not include arguments that can be set as batch properties, such as --conf
, since a collision can occur that causes an incorrect batch submission.
execution AbstractBatch-ExecutionConfiguration
Execution configuration for a workload.
fileUris array
HCFS URIs of files to be placed in the working directory of each executor.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
impersonatedServiceAccount string
The GCP service account to impersonate.
jarFileUris array
HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
peripherals AbstractBatch-PeripheralsConfiguration
Peripherals configuration for a workload.
projectId string
The GCP project ID.
runtime AbstractBatch-RuntimeConfiguration
Runtime configuration for a workload.
scopes array
["https://www.googleapis.com/auth/cloud-platform"]
The GCP scopes to be used.
serviceAccount string
The GCP service account.
Outputs
state string
STATE_UNSPECIFIED
PENDING
RUNNING
CANCELLING
CANCELLED
SUCCEEDED
FAILED
UNRECOGNIZED
The state of the batch.
Definitions
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-PeripheralsConfiguration
metastoreService string
Resource name of an existing Dataproc Metastore service.
Example: projects/[project_id]/locations/[region]/services/[service_id]
sparkHistoryServer AbstractBatch-SparkHistoryServerConfiguration
Resource name of an existing Dataproc Metastore service.
Example: projects/[project_id]/locations/[region]/services/[service_id]
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-RuntimeConfiguration
containerImage string
Optional custom container image for the job runtime environment.
If not specified, a default container image will be used.
properties object
properties used to configure the workload execution (map of key/value pairs).
version string
Version of the batch runtime.
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-SparkHistoryServerConfiguration
dataprocCluster string
Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.
Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]
io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-ExecutionConfiguration
kmsKey string
The Cloud KMS key to use for encryption.
networkTags array
Tags used for network traffic control.
networkUri string
Network URI to connect workload to.
serviceAccountEmail string
Service account used to execute workload.
subnetworkUri string
Subnetwork URI to connect workload to.