SparkSqlSubmit
Apache Spark SQL queries as a batch workload.
type: "io.kestra.plugin.gcp.dataproc.batches.SparkSqlSubmit"
id: gcp_dataproc_spark_sql_submit
namespace: company.team
tasks:
- id: spark_sql_submit
type: io.kestra.plugin.gcp.dataproc.batches.SparkSqlSubmit
queryFileUri: 'gs://spark-jobs-kestra/foobar.py'
name: test-sparksql
region: europe-west3
YES
The batch name
YES
The HCFS URI of the script that contains Spark SQL queries to execute.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
YES
The region
YES
HCFS URIs of archives to be extracted into the working director of each executor. Supported file types: .jar
, .tar
, .tar.gz
, .tgz
, and .zip
.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
YES
The arguments to pass to the driver.
Do not include arguments that can be set as batch properties, such as --conf
, since a collision can occur that causes an incorrect batch submission.
YES
Execution configuration for a workload.
YES
HCFS URIs of files to be placed in the working directory of each executor.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
YES
The GCP service account to impersonate.
YES
HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix
YES
Peripherals configuration for a workload.
YES
The GCP project ID.
YES
Runtime configuration for a workload.
YES
["https://www.googleapis.com/auth/cloud-platform"]
The GCP scopes to be used.
YES
The GCP service account.
STATE_UNSPECIFIED
PENDING
RUNNING
CANCELLING
CANCELLED
SUCCEEDED
FAILED
UNRECOGNIZED
The state of the batch.
YES
Resource name of an existing Dataproc Metastore service.
Example: projects/[project_id]/locations/[region]/services/[service_id]
YES
Resource name of an existing Dataproc Metastore service.
Example: projects/[project_id]/locations/[region]/services/[service_id]
YES
Optional custom container image for the job runtime environment.
If not specified, a default container image will be used.
YES
properties used to configure the workload execution (map of key/value pairs).
YES
Version of the batch runtime.
YES
Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.
Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]
YES
The Cloud KMS key to use for encryption.
YES
Tags used for network traffic control.
YES
Network URI to connect workload to.
YES
Service account used to execute workload.
YES
Subnetwork URI to connect workload to.