JarSubmit JarSubmit

yaml
type: "io.kestra.plugin.spark.JarSubmit"

Submit a jar spark job to remote cluster

Examples

yaml
id: "jar_submit"
type: "io.kestra.plugin.spark.JarSubmit"
runner: DOCKER
dockerOptions:
  image: bitnami/spark
  entryPoint: 
   - /bin/sh
   - -c
  user: root
master: spark://localhost:7077
mainResource: {{ inputs.file }}
mainClass: spark.samples.App

Properties

exitOnFailed

  • Type: boolean
  • Dynamic:
  • Required: ✔️
  • Default: true

Exit if any non true return value

This tells bash that it should exit the script if any statement returns a non-true return value. The benefit of using -e is that it prevents errors snowballing into serious issues when they could have been caught earlier.

interpreter

  • Type: string
  • Dynamic:
  • Required: ✔️
  • Default: /bin/sh
  • Min length: 1

Interpreter to used

mainClass

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

the application class name for Java/Scala applications.

mainResource

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

the main application resource

This should be the location of a jar file for Scala/Java applications, or a python script for PySpark applications. Must be Kestra internal storage url

master

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

the Spark master hostname for the application.

runner

  • Type: string
  • Dynamic:
  • Required: ✔️
  • Default: PROCESS
  • Possible Values:
    • PROCESS
    • DOCKER

Runner to use

warningOnStdErr

  • Type: boolean
  • Dynamic:
  • Required: ✔️
  • Default: true

Use WARNING state if any stdErr is sent

appFiles

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Adds a file to be submitted with the application.

Must be Kestra internal storage url

args

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

command line arguments for the application.

configurations

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

configuration value for the application.

deployMode

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Possible Values:
    • CLIENT
    • CLUSTER

command line arguments for the application.

dockerOptions

Docker options when using runner DOCKER

env

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Additional environments variable to add for current process.

files

🔒 Deprecated

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

The list of files that will be uploaded to internal storage,

use outputFiles property instead

inputFiles

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Input files are extra files that will be available in the script working directory.

You can define the files as map or a JSON string.Each file can be defined inlined or can reference a file from Kestra's internal storage.

interpreterArgs

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:
  • Default: [-c]

Interpreter args used

jars

  • Type: object
  • SubType: string
  • Dynamic: ✔️
  • Required:

Adds jar files to be submitted with the application.

Must be Kestra internal storage url

name

  • Type: string
  • Dynamic: ✔️
  • Required:

the application name.

outputDirs

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:

Output dirs list that will be uploaded to internal storage

List of key that will generate temporary directories. On the command, just can use with special variable named outputDirs.key. If you add a files with ["myDir"], you can use the special vars echo 1 >> {[ outputDirs.myDir }}/file1.txt and echo 2 >> {[ outputDirs.myDir }}/file2.txt and both files will be uploaded to internal storage. Then you can used them on others tasks using null

outputFiles

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:

Output file list that will be uploaded to internal storage

List of key that will generate temporary files. On the command, just can use with special variable named outputFiles.key. If you add a files with ["first"], you can use the special vars echo 1 >> {[ outputFiles.first }} and you used on others tasks using null

outputsFiles

🔒 Deprecated

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:

Deprecated Output file

use outputFiles

sparkSubmitPath

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: spark-submit

the spark-submit binary path.

verbose

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: false

Enables verbose reporting

Outputs

exitCode

  • Type: integer
  • Default: 0

The exit code of the whole execution

files

🔒 Deprecated

  • Type: object
  • SubType: string

Deprecated output files

use outputFiles

outputFiles

  • Type: object
  • SubType: string

The output files uri in Kestra internal storage

stdErrLineCount

  • Type: integer
  • Default: 0

The standard error line count

stdOutLineCount

  • Type: integer
  • Default: 0

The standard output line count

vars

  • Type: object

The value extract from output of the commands

Definitions

DockerOptions-Memory

kernelMemory

  • Type: string
  • Dynamic: ✔️
  • Required:

The maximum amount of kernel memory the container can use.

The minimum allowed value is 4m. Because kernel memory cannot be swapped out, a container which is starved of kernel memory may block host machine resources, which can have side effects on the host machine and on other containers. See --kernel-memory details.

memory

  • Type: string
  • Dynamic: ✔️
  • Required:

The maximum amount of memory the container can use.

That is, you must set the value to at least 6 megabytes.

memoryReservation

  • Type: string
  • Dynamic: ✔️
  • Required:

Allows you to specify a soft limit smaller than --memory which is activated when Docker detects contention or low memory on the host machine.

If you use memoryReservation, it must be set lower than memory for it to take precedence. Because it is a soft limit, it does not guarantee that the container doesn’t exceed the limit.

memorySwap

  • Type: string
  • Dynamic: ✔️
  • Required:

The amount of memory this container is allowed to swap to disk

If memory and memorySwap are set to the same value, this prevents containers from using any swap. This is because memorySwap is the amount of combined memory and swap that can be used, while memory is only the amount of physical memory that can be used.

memorySwappiness

  • Type: string
  • Dynamic: ✔️
  • Required:

The amount of memory this container is allowed to swap to disk

By default, the host kernel can swap out a percentage of anonymous pages used by a container. You can set memorySwappiness to a value between 0 and 100, to tune this percentage.

oomKillDisable

  • Type: boolean
  • Dynamic:
  • Required:

By default, if an out-of-memory (OOM) error occurs, the kernel kills processes in a container.

To change this behavior, use the oomKillDisable option. Only disable the OOM killer on containers where you have also set the memory option. If the memory flag is not set, the host can run out of memory and the kernel may need to kill the host system’s processes to free memory.

DockerOptions

image

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️
  • Min length: 1

Docker image to use

cpu

Limits cpu usage.

By default, each container’s access to the host machine’s CPU cycles is unlimited. You can set various constraints to limit a given container’s access to the host machine’s CPU cycles.

deviceRequests

A list of request for devices to be sent to device drivers

dockerConfig

  • Type: string
  • Dynamic: ✔️
  • Required:

Docker config file

Full file that can be used to configure private registries, ...

dockerHost

  • Type: string
  • Dynamic: ✔️
  • Required:

Docker api uri

entryPoint

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

Docker entrypoint to use

extraHosts

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

Docker extra host to use

memory

Limits memory usage.

Docker can enforce hard memory limits, which allow the container to use no more than a given amount of user or system memory, or soft limits, which allow the container to use as much memory as it needs unless certain conditions are met, such as when the kernel detects low memory or contention on the host machine. Some of these options have different effects when used alone or when more than one option is set.

networkMode

  • Type: string
  • Dynamic: ✔️
  • Required:

Docker network mode to use

pullImage

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: true

Is a pull of image must be done before starting the container

Mostly used for local image with registry

user

  • Type: string
  • Dynamic: ✔️
  • Required:

Docker user to use

volumes

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

List of volumes to mount

Must be a valid mount expression as string, example : /home/user:/app

Volumes mount are disabled by default for security reasons, you must enabled on server configuration with kestra.tasks.scripts.docker.volume-enabled to true

DockerOptions-Cpu

cpus

  • Type: integer
  • Dynamic:
  • Required:

Specify how much of the available CPU resources a container can use.

For instance, if the host machine has two CPUs and you set cpus:"1.5", the container is guaranteed at most one and a half of the CPUs

DockerOptions-DeviceRequest

capabilities

  • Type: array
  • SubType: array
  • Dynamic:
  • Required:

A list of capabilities; an OR list of AND lists of capabilities.

count

  • Type: integer
  • Dynamic:
  • Required:

A request for devices to be sent to device drivers

deviceIds

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:

A request for devices to be sent to device drivers

driver

  • Type: string
  • Dynamic:
  • Required:

A request for devices to be sent to device drivers

options

  • Type: object
  • SubType: string
  • Dynamic:
  • Required:

Driver-specific options, specified as a key/value pairs.

These options are passed directly to the driver.