Source
yaml
id: start-job-on-existing-cluster
namespace: company.team
tasks:
- id: create_job
type: io.kestra.plugin.databricks.job.CreateJob
authentication:
token: "{{ secret('DATABRICKS_TOKEN') }}"
host: "{{ secret('DATABRICKS_HOST') }}"
jobTasks:
- existingClusterId: abcdefg12345678
taskKey: yourArbitraryTaskKey
sparkPythonTask:
pythonFile: /Shared/hello.py
sparkPythonTaskSource: WORKSPACE
waitForCompletion: PT5H
- id: log_status
type: io.kestra.plugin.core.log.Log
message: The job finished, all done!
About this blueprint
Databricks
This flow will start a job on an existing Databricks cluster. The Python script referenced on pythonFile
property is stored in the Databricks workspace. The flow will wait up to five hours for the task to complete before running the next task (here, simply printing a log message to the console).
More Related Blueprints