PythonSubmit​Python​Submit

yaml
type: "io.kestra.plugin.spark.PythonSubmit"

Submit a PySpark job to a remote cluster.

Examples

yaml
id: spark_python_submit
namespace: company.team

tasks:
  - id: python_submit
    type: io.kestra.plugin.spark.PythonSubmit
    runner: DOCKER
    docker:
      networkMode: host
      user: root
    master: spark://localhost:7077
    args:
      - "10"
    mainScript: |
      import sys
      from random import random
      from operator import add
      from pyspark.sql import SparkSession


      if __name__ == "__main__":
          spark = SparkSession               .builder               .appName("PythonPi")               .getOrCreate()

          partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
          n = 100000 * partitions

          def f(_: int) -> float:
              x = random() * 2 - 1
              y = random() * 2 - 1
              return 1 if x ** 2 + y ** 2 <= 1 else 0

          count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
          print("Pi is roughly %f" % (4.0 * count / n))

          spark.stop()

Properties

mainScript

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The main Python script.

master

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

Spark master hostname for the application.

appFiles

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Adds a file to be submitted with the application.

Must be an internal storage URI.

args

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

Command line arguments for the application.

configurations

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Configuration properties for the application.

containerImage

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: bitnami/spark

The task runner container image, only used if the task runner is container-based.

deployMode

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Possible Values:
    • CLIENT
    • CLUSTER

Deploy mode for the application.

docker

Deprecated, use 'taskRunner' instead

env

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Additional environment variables for the current process.

name

  • Type: string
  • Dynamic: ✔️
  • Required:

Spark application name.

pythonFiles

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Adds a Python file/zip/egg package to be submitted with the application.

Must be an internal storage URI.

runner

  • Type: string
  • Dynamic:
  • Required:
  • Possible Values:
    • PROCESS
    • DOCKER

Script runner to use.

Deprecated - use 'taskRunner' instead.

sparkSubmitPath

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: spark-submit

The spark-submit binary path.

taskRunner

  • Type: TaskRunner
  • Dynamic:
  • Required:
  • Default: { "type": "io.kestra.plugin.scripts.runner.docker.Docker" }

The task runner to use.

Task runners are provided by plugins, each have their own properties.

verbose

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: false

Enables verbose reporting.

Outputs

exitCode

  • Type: integer
  • Required: ✔️
  • Default: 0

outputFiles

  • Type: object
  • SubType: string
  • Required:

vars

  • Type: object
  • Required:

Definitions

io.kestra.plugin.scripts.runner.docker.Cpu

  • cpus
    • Type: integer
    • Dynamic:
    • Required:

io.kestra.core.models.tasks.runners.TaskRunner

  • type
    • Type: string
    • Dynamic:
    • Required: ✔️
    • Validation RegExp: \p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*(\.\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*)*
    • Min length: 1

io.kestra.plugin.scripts.runner.docker.Memory

  • kernelMemory
    • Type: string
    • Dynamic: ✔️
    • Required:
  • memory
    • Type: string
    • Dynamic: ✔️
    • Required:
  • memoryReservation
    • Type: string
    • Dynamic: ✔️
    • Required:
  • memorySwap
    • Type: string
    • Dynamic: ✔️
    • Required:
  • memorySwappiness
    • Type: string
    • Dynamic: ✔️
    • Required:
  • oomKillDisable
    • Type: boolean
    • Dynamic:
    • Required:

io.kestra.plugin.scripts.exec.scripts.models.DockerOptions

  • image
    • Type: string
    • Dynamic: ✔️
    • Required: ✔️
    • Min length: 1
  • config
    • Type:
      • string
      • object
    • Dynamic: ✔️
    • Required:
  • cpu
    • Type: Cpu
    • Dynamic:
    • Required:
  • credentials
  • deviceRequests
  • entryPoint
    • Type: array
    • SubType: string
    • Dynamic: ✔️
    • Required:
  • extraHosts
    • Type: array
    • SubType: string
    • Dynamic: ✔️
    • Required:
  • host
    • Type: string
    • Dynamic: ✔️
    • Required:
  • memory
    • Type: Memory
    • Dynamic:
    • Required:
  • networkMode
    • Type: string
    • Dynamic: ✔️
    • Required:
  • pullPolicy
    • Type: string
    • Dynamic:
    • Required:
    • Default: ALWAYS
    • Possible Values:
      • IF_NOT_PRESENT
      • ALWAYS
      • NEVER
  • shmSize
    • Type: string
    • Dynamic: ✔️
    • Required:
  • user
    • Type: string
    • Dynamic: ✔️
    • Required:
  • volumes
    • Type: array
    • SubType: string
    • Dynamic: ✔️
    • Required:

io.kestra.plugin.scripts.runner.docker.Credentials

  • auth
    • Type: string
    • Dynamic: ✔️
    • Required:
  • identityToken
    • Type: string
    • Dynamic: ✔️
    • Required:
  • password
    • Type: string
    • Dynamic: ✔️
    • Required:
  • registry
    • Type: string
    • Dynamic: ✔️
    • Required:
  • registryToken
    • Type: string
    • Dynamic: ✔️
    • Required:
  • username
    • Type: string
    • Dynamic: ✔️
    • Required:

io.kestra.plugin.scripts.runner.docker.DeviceRequest

  • capabilities
    • Type: array
    • SubType: array
    • Dynamic:
    • Required:
  • count
    • Type: integer
    • Dynamic:
    • Required:
  • deviceIds
    • Type: array
    • SubType: string
    • Dynamic: ✔️
    • Required:
  • driver
    • Type: string
    • Dynamic: ✔️
    • Required:
  • options
    • Type: object
    • SubType: string
    • Dynamic:
    • Required: