Source
yaml
id: on-demand-cluster-job
namespace: company.team
tasks:
- id: create_cluster
type: io.kestra.plugin.databricks.cluster.CreateCluster
authentication:
token: "{{ secret('DATABRICKS_TOKEN') }}"
host: "{{ secret('DATABRICKS_HOST') }}"
clusterName: kestra-demo
nodeTypeId: n2-highmem-4
numWorkers: 1
sparkVersion: 13.0.x-scala2.12
- id: allow_failure
type: io.kestra.plugin.core.flow.AllowFailure
tasks:
- id: run_job
type: io.kestra.plugin.databricks.job.CreateJob
authentication:
token: "{{ secret('DATABRICKS_TOKEN') }}"
host: "{{ secret('DATABRICKS_HOST') }}"
jobTasks:
- existingClusterId: "{{ outputs.create_cluster.clusterId }}"
taskKey: yourArbitraryTaskKey
sparkPythonTask:
pythonFile: /Shared/hello.py
sparkPythonTaskSource: WORKSPACE
waitForCompletion: PT5M
- id: delete_cluster
type: io.kestra.plugin.databricks.cluster.DeleteCluster
authentication:
token: "{{ secret('DATABRICKS_TOKEN') }}"
host: "{{ secret('DATABRICKS_HOST') }}"
clusterId: "{{ outputs.create_cluster.clusterId }}"
About this blueprint
Databricks
This flow will create a Databricks cluster, run a task on the cluster, and then delete the cluster.
- The Python script referenced on
pythonFile
property is stored in the Databricks workspace. - The flow will run the script on a Databricks cluster and wait up to five minutes (as declared on the
waitForCompletion
property) for the task to complete. Even if the job fails, theAllowFailure
tasks ensures that Databricks cluster will be deleted in the end.
More Related Blueprints