Submit an [Apache PySpark](https://spark.apache.org/docs/latest/api/python/getting

yaml
id: gcp_dataproc_py_spark_submit
namespace: company.team
tasks:
  - id: py_spark_submit
    type: io.kestra.plugin.gcp.dataproc.batches.PySparkSubmit
    mainPythonFileUri: 'gs://spark-jobs-kestra/pi.py'
    name: test-pyspark
    region: europe-west3

The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

The batch name

The region

SubType string

HCFS URIs of archives to be extracted into the working director of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

SubType string

The arguments to pass to the driver.

Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.

Execution configuration for a workload.

SubType string

HCFS URIs of files to be placed in the working directory of each executor.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

The GCP service account to impersonate.

SubType string

HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

Peripherals configuration for a workload.

The GCP project ID.

Runtime configuration for a workload.

SubType string

Default ["https://www.googleapis.com/auth/cloud-platform"]

The GCP scopes to be used.

The GCP service account.

Possible Values

STATE_UNSPECIFIEDPENDINGRUNNINGCANCELLINGCANCELLEDSUCCEEDEDFAILEDUNRECOGNIZED

The state of the batch.

Resource name of an existing Dataproc Metastore service.

Example: projects/[project_id]/locations/[region]/services/[service_id]

Resource name of an existing Dataproc Metastore service.

Example: projects/[project_id]/locations/[region]/services/[service_id]

Optional custom container image for the job runtime environment.

If not specified, a default container image will be used.

SubType string

properties used to configure the workload execution (map of key/value pairs).

Version of the batch runtime.

Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.

Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]

The Cloud KMS key to use for encryption.

SubType string

Tags used for network traffic control.

Network URI to connect workload to.

Service account used to execute workload.

Subnetwork URI to connect workload to.