CreateJob CreateJob

yaml
type: "io.kestra.plugin.databricks.job.CreateJob"

Create a Databricks job and run it. Set waitForCompletion to the desired maximum duration if you want the task to wait for the job completion (e.g., PT1H to wait up to one hour).

Examples

Create a Databricks job, run it, and wait for completion for five minutes

yaml
id: "create_job"
type: "io.kestra.plugin.databricks.job.CreateJob"
id: createJob
type: io.kestra.plugin.databricks.job.CreateJob
authentication:
  token: <your-token>
host: <your-host>
jobTasks:
  - existingClusterId: <your-cluster>
    taskKey: taskKey
    sparkPythonTask:
      pythonFile: /Shared/hello.py
      sparkPythonTaskSource: WORKSPACE
waitForCompletion: PT5M

Properties

jobTasks

  • Type: array
  • SubType: JobTaskSetting
  • Dynamic:
  • Required: ✔️
  • Min items: 1

The job tasks, if multiple tasks are defined you must set dependsOn on each task

accountId

  • Type: string
  • Dynamic: ✔️
  • Required:

Databricks account identifier

authentication

Databricks authentication configuration

This property allows to configure the authentication to Databricks, different properties should be set depending on the type of authentication and the cloud provider. All configuration options can also be set using the standard Databricks environment variables. Check the Databricks authentication guide for more information.

configFile

  • Type: string
  • Dynamic: ✔️
  • Required:

Databricks configuration file, use this if you don't want to configure each Databricks account properties one by one

host

  • Type: string
  • Dynamic: ✔️
  • Required:

Databricks host

jobName

  • Type: string
  • Dynamic: ✔️
  • Required:

The name of the job

waitForCompletion

  • Type: string
  • Dynamic:
  • Required:
  • Format: duration

If set, the task will wait for the job run completion for up to the waitForCompletion duration before timing out.

Outputs

jobId

  • Type: integer

The job identifier

jobURI

  • Type: string

The job URI on the Databricks console

runId

  • Type: integer

The run identifier

runURI

  • Type: string

The run URI on the Databricks console

Definitions

AuthenticationConfig

authType

  • Type: string
  • Dynamic:
  • Required:

azureClientId

  • Type: string
  • Dynamic: ✔️
  • Required:

azureClientSecret

  • Type: string
  • Dynamic: ✔️
  • Required:

azureTenantId

  • Type: string
  • Dynamic: ✔️
  • Required:

clientId

  • Type: string
  • Dynamic: ✔️
  • Required:

clientSecret

  • Type: string
  • Dynamic: ✔️
  • Required:

googleCredentials

  • Type: string
  • Dynamic: ✔️
  • Required:

googleServiceAccount

  • Type: string
  • Dynamic: ✔️
  • Required:

password

  • Type: string
  • Dynamic: ✔️
  • Required:

token

  • Type: string
  • Dynamic: ✔️
  • Required:

username

  • Type: string
  • Dynamic: ✔️
  • Required:

SqlTaskSetting

parameters

  • Type: object
  • SubType: string
  • Dynamic:
  • Required:

queryId

  • Type: string
  • Dynamic: ✔️
  • Required:

warehouseId

  • Type: string
  • Dynamic: ✔️
  • Required:

NotebookTaskSetting

baseParameters

  • Type: object
  • SubType: string
  • Dynamic:
  • Required:

notebookPath

  • Type: string
  • Dynamic: ✔️
  • Required:

source

  • Type: string
  • Dynamic:
  • Required:
  • Possible Values:
    • GIT
    • WORKSPACE

SparkPythonTaskSetting

pythonFile

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

sparkPythonTaskSource

  • Type: string
  • Dynamic:
  • Required: ✔️
  • Possible Values:
    • GIT
    • WORKSPACE

parameters

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

JobTaskSetting

dbtTask

DBT task settings

dependsOn

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:

Task dependencies, set this if multiple tasks are defined on the job

description

  • Type: string
  • Dynamic: ✔️
  • Required:

Task description

existingClusterId

  • Type: string
  • Dynamic: ✔️
  • Required:

The identifier of the cluster

notebookTask

Notebook task settings

pipelineTask

Pipeline task settings

pythonWheelTask

Python Wheel task settings

sparkJarTask

Spark JAR task settings

sparkPythonTask

Spark Python task settings

sparkSubmitTask

Spark Submit task settings

sqlTask

SQL task settings

taskKey

  • Type: string
  • Dynamic: ✔️
  • Required:

Task key

timeoutSeconds

  • Type: integer
  • Dynamic:
  • Required:

Task timeout in seconds

PythonWheelTaskSetting

entryPoint

  • Type: string
  • Dynamic: ✔️
  • Required:

namedParameters

  • Type: object
  • SubType: string
  • Dynamic:
  • Required:

packageName

  • Type: string
  • Dynamic: ✔️
  • Required:

parameters

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

PipelineTaskSetting

fullRefresh

  • Type: boolean
  • Dynamic:
  • Required:

pipelineId

  • Type: string
  • Dynamic: ✔️
  • Required:

DbtTaskSetting

catalog

  • Type: string
  • Dynamic: ✔️
  • Required:

commands

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

schema

  • Type: string
  • Dynamic: ✔️
  • Required:

warehouseId

  • Type: string
  • Dynamic: ✔️
  • Required:

SparkJarTaskSetting

jarUri

  • Type: string
  • Dynamic: ✔️
  • Required:

mainClassName

  • Type: string
  • Dynamic: ✔️
  • Required:

parameters

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

SparkSubmitTaskSetting

parameters

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required: