Blueprints

Run a task on an on-demand Databricks cluster

Source

yaml
id: on-demand-cluster-job
namespace: company.team

tasks:
  - id: create_cluster
    type: io.kestra.plugin.databricks.cluster.CreateCluster
    authentication:
      token: "{{ secret('DATABRICKS_TOKEN') }}"
    host: "{{ secret('DATABRICKS_HOST') }}"
    clusterName: kestra-demo
    nodeTypeId: n2-highmem-4
    numWorkers: 1
    sparkVersion: 13.0.x-scala2.12

  - id: allow_failure
    type: io.kestra.plugin.core.flow.AllowFailure
    tasks:
      - id: run_job
        type: io.kestra.plugin.databricks.job.CreateJob
        authentication:
          token: "{{ secret('DATABRICKS_TOKEN') }}"
        host: "{{ secret('DATABRICKS_HOST') }}"
        jobTasks:
          - existingClusterId: "{{ outputs.create_cluster.clusterId }}"
            taskKey: yourArbitraryTaskKey
            sparkPythonTask:
              pythonFile: /Shared/hello.py
              sparkPythonTaskSource: WORKSPACE
        waitForCompletion: PT5M

  - id: delete_cluster
    type: io.kestra.plugin.databricks.cluster.DeleteCluster
    authentication:
      token: "{{ secret('DATABRICKS_TOKEN') }}"
    host: "{{ secret('DATABRICKS_HOST') }}"
    clusterId: "{{ outputs.create_cluster.clusterId }}"

About this blueprint

Databricks

This flow will create a Databricks cluster, run a task on the cluster, and then delete the cluster.

  • The Python script referenced on pythonFile property is stored in the Databricks workspace.
  • The flow will run the script on a Databricks cluster and wait up to five minutes (as declared on the waitForCompletion property) for the task to complete. Even if the job fails, the AllowFailure tasks ensures that Databricks cluster will be deleted in the end.

Create Cluster

Allow Failure

Create Job

Delete Cluster

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra