Blueprints

Create a Spark job on a Databricks cluster and wait for its completion

Source

yaml
id: start-job-on-existing-cluster
namespace: company.team

tasks:
  - id: create_job
    type: io.kestra.plugin.databricks.job.CreateJob
    authentication:
      token: "{{ secret('DATABRICKS_TOKEN') }}"
    host: "{{ secret('DATABRICKS_HOST') }}"
    jobTasks:
      - existingClusterId: abcdefg12345678
        taskKey: yourArbitraryTaskKey
        sparkPythonTask:
          pythonFile: /Shared/hello.py
          sparkPythonTaskSource: WORKSPACE
    waitForCompletion: PT5H

  - id: log_status
    type: io.kestra.plugin.core.log.Log
    message: The job finished, all done!

About this blueprint

Databricks

This flow will start a job on an existing Databricks cluster. The Python script referenced on pythonFile property is stored in the Databricks workspace. The flow will wait up to five hours for the task to complete before running the next task (here, simply printing a log message to the console).

Create Job

Log

More Related Blueprints

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra