PythonSubmit
type: "io.kestra.plugin.spark.PythonSubmit"
Submit a PySpark job to a remote cluster.
Examples
id: spark_python_submit
namespace: company.team
tasks:
- id: python_submit
type: io.kestra.plugin.spark.PythonSubmit
runner: DOCKER
docker:
networkMode: host
user: root
master: spark://localhost:7077
args:
- "10"
mainScript: |
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
if __name__ == "__main__":
spark = SparkSession .builder .appName("PythonPi") .getOrCreate()
partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
n = 100000 * partitions
def f(_: int) -> float:
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
spark.stop()
Properties
mainScript
- Type: string
- Dynamic: ✔️
- Required: ✔️
The main Python script.
master
- Type: string
- Dynamic: ✔️
- Required: ✔️
Spark master hostname for the application.
Spark master URL formats.
appFiles
- Type: object
- SubType: string
- Dynamic: ✔️
- Required: ❌
Adds a file to be submitted with the application.
Must be an internal storage URI.
args
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
Command line arguments for the application.
configurations
- Type: object
- SubType: string
- Dynamic: ✔️
- Required: ❌
Configuration properties for the application.
containerImage
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
bitnami/spark
The task runner container image, only used if the task runner is container-based.
deployMode
- Type: string
- Dynamic: ✔️
- Required: ❌
- Possible Values:
CLIENT
CLUSTER
Deploy mode for the application.
docker
- Type: DockerOptions
- Dynamic: ❌
- Required: ❌
Deprecated, use 'taskRunner' instead
env
- Type: object
- SubType: string
- Dynamic: ✔️
- Required: ❌
Additional environment variables for the current process.
name
- Type: string
- Dynamic: ✔️
- Required: ❌
Spark application name.
pythonFiles
- Type: object
- SubType: string
- Dynamic: ✔️
- Required: ❌
Adds a Python file/zip/egg package to be submitted with the application.
Must be an internal storage URI.
runner
- Type: string
- Dynamic: ❌
- Required: ❌
- Possible Values:
PROCESS
DOCKER
Script runner to use.
Deprecated - use 'taskRunner' instead.
sparkSubmitPath
- Type: string
- Dynamic: ✔️
- Required: ❌
- Default:
spark-submit
The spark-submit
binary path.
taskRunner
- Type: TaskRunner
- Dynamic: ❌
- Required: ❌
- Default:
{ "type": "io.kestra.plugin.scripts.runner.docker.Docker" }
The task runner to use.
Task runners are provided by plugins, each have their own properties.
verbose
- Type: boolean
- Dynamic: ❌
- Required: ❌
- Default:
false
Enables verbose reporting.
Outputs
exitCode
- Type: integer
- Required: ✔️
- Default:
0
outputFiles
- Type: object
- SubType: string
- Required: ❌
vars
- Type: object
- Required: ❌
Definitions
io.kestra.plugin.scripts.runner.docker.Cpu
cpus
- Type: integer
- Dynamic: ❌
- Required: ❌
io.kestra.core.models.tasks.runners.TaskRunner
type
- Type: string
- Dynamic: ❌
- Required: ✔️
- Validation RegExp:
\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*(\.\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*)*
- Min length:
1
io.kestra.plugin.scripts.runner.docker.Memory
kernelMemory
- Type: string
- Dynamic: ✔️
- Required: ❌
memory
- Type: string
- Dynamic: ✔️
- Required: ❌
memoryReservation
- Type: string
- Dynamic: ✔️
- Required: ❌
memorySwap
- Type: string
- Dynamic: ✔️
- Required: ❌
memorySwappiness
- Type: string
- Dynamic: ✔️
- Required: ❌
oomKillDisable
- Type: boolean
- Dynamic: ❌
- Required: ❌
io.kestra.plugin.scripts.exec.scripts.models.DockerOptions
image
- Type: string
- Dynamic: ✔️
- Required: ✔️
- Min length:
1
config
- Type:
- string
- object
- Dynamic: ✔️
- Required: ❌
- Type:
cpu
- Type: Cpu
- Dynamic: ❌
- Required: ❌
credentials
- Type: Credentials
- Dynamic: ✔️
- Required: ❌
deviceRequests
- Type: array
- SubType: DeviceRequest
- Dynamic: ❌
- Required: ❌
entryPoint
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
extraHosts
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
host
- Type: string
- Dynamic: ✔️
- Required: ❌
memory
- Type: Memory
- Dynamic: ❌
- Required: ❌
networkMode
- Type: string
- Dynamic: ✔️
- Required: ❌
pullPolicy
- Type: string
- Dynamic: ❌
- Required: ❌
- Default:
ALWAYS
- Possible Values:
IF_NOT_PRESENT
ALWAYS
NEVER
shmSize
- Type: string
- Dynamic: ✔️
- Required: ❌
user
- Type: string
- Dynamic: ✔️
- Required: ❌
volumes
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
io.kestra.plugin.scripts.runner.docker.Credentials
auth
- Type: string
- Dynamic: ✔️
- Required: ❌
identityToken
- Type: string
- Dynamic: ✔️
- Required: ❌
password
- Type: string
- Dynamic: ✔️
- Required: ❌
registry
- Type: string
- Dynamic: ✔️
- Required: ❌
registryToken
- Type: string
- Dynamic: ✔️
- Required: ❌
username
- Type: string
- Dynamic: ✔️
- Required: ❌
io.kestra.plugin.scripts.runner.docker.DeviceRequest
capabilities
- Type: array
- SubType: array
- Dynamic: ❌
- Required: ❌
count
- Type: integer
- Dynamic: ❌
- Required: ❌
deviceIds
- Type: array
- SubType: string
- Dynamic: ✔️
- Required: ❌
driver
- Type: string
- Dynamic: ✔️
- Required: ❌
options
- Type: object
- SubType: string
- Dynamic: ❌
- Required: ❌