yaml
type: "io.kestra.plugin.gcp.dataproc.batches.SparkSqlSubmit"

Apache Spark SQL queries as a batch workload.

Examples

yaml
id: gcp_dataproc_spark_sql_submit
namespace: company.team
tasks:
  - id: spark_sql_submit
    type: io.kestra.plugin.gcp.dataproc.batches.SparkSqlSubmit
    queryFileUri: 'gs://spark-jobs-kestra/foobar.py'
    name: test-sparksql
    region: europe-west3

Properties

name

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The batch name

queryFileUri

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The HCFS URI of the script that contains Spark SQL queries to execute.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

region

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The region

archiveUris

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

HCFS URIs of archives to be extracted into the working director of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

args

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

The arguments to pass to the driver.

Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.

execution

Execution configuration for a workload.

fileUris

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

HCFS URIs of files to be placed in the working directory of each executor.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

impersonatedServiceAccount

  • Type: string
  • Dynamic: ✔️
  • Required:

The GCP service account to impersonate.

jarFileUris

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.

Hadoop Compatible File System (HCFS) URIs should be accessible from the cluster. Can be a GCS file with the gs:// prefix, an HDFS file on the cluster with the hdfs:// prefix, or a local file on the cluster with the file:// prefix

peripherals

Peripherals configuration for a workload.

projectId

  • Type: string
  • Dynamic: ✔️
  • Required:

The GCP project ID.

runtime

Runtime configuration for a workload.

scopes

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [ "https://www.googleapis.com/auth/cloud-platform" ]

The GCP scopes to be used.

serviceAccount

  • Type: string
  • Dynamic: ✔️
  • Required:

The GCP service account.

Outputs

state

  • Type: string
  • Required:
  • Possible Values:
    • STATE_UNSPECIFIED
    • PENDING
    • RUNNING
    • CANCELLING
    • CANCELLED
    • SUCCEEDED
    • FAILED
    • UNRECOGNIZED

Definitions

io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-PeripheralsConfiguration

io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-RuntimeConfiguration

  • containerImage
    • Type: string
    • Dynamic: ✔️
    • Required:
  • properties
    • Type: object
    • SubType: string
    • Dynamic: ✔️
    • Required:
  • version
    • Type: string
    • Dynamic: ✔️
    • Required:

io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-SparkHistoryServerConfiguration

  • dataprocCluster
    • Type: string
    • Dynamic: ✔️
    • Required:

io.kestra.plugin.gcp.dataproc.batches.AbstractBatch-ExecutionConfiguration

  • kmsKey
    • Type: string
    • Dynamic: ✔️
    • Required:
  • networkTags
    • Type: array
    • SubType: string
    • Dynamic: ✔️
    • Required:
  • networkUri
    • Type: string
    • Dynamic: ✔️
    • Required:
  • serviceAccountEmail
    • Type: string
    • Dynamic: ✔️
    • Required:
  • subnetworkUri
    • Type: string
    • Dynamic: ✔️
    • Required: