Source
yaml
id: download-parquet-from-databricks
namespace: company.team
description: >
This flow will download a Parquet file from Databricks File System (DBFS) to
Kestra's internal storage.
tasks:
- id: download
type: io.kestra.plugin.databricks.dbfs.Download
authentication:
token: "{{ secret('DATABRICKS_TOKEN') }}"
host: "{{ secret('DATABRICKS_HOST') }}"
from: /Shared/myFile.parquet
- id: process_downloaded_file
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
containerImage: ghcr.io/kestra-io/pydata:latest
script: |
import pandas as pd
df = pd.read_parquet("{{ outputs.download.uri }}")
df.head()
About this blueprint
Python Outputs Databricks
This flow will:
- Download a Parquet file from Databricks File System (DBFS) to Kestra's internal storage
- Use the file in a Python script running in a container