Download a Parquet file from Databricks and use it in a Python script

Source

yaml

id: download-parquet-from-databricks
namespace: company.team
description: >
  This flow will download a Parquet file from Databricks File System (DBFS) to
  Kestra's internal storage.

tasks:
  - id: download
    type: io.kestra.plugin.databricks.dbfs.Download
    authentication:
      token: "{{ secret('DATABRICKS_TOKEN') }}"
    host: "{{ secret('DATABRICKS_HOST') }}"
    from: /Shared/myFile.parquet

  - id: process_downloaded_file
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    script: |
      import pandas as pd

      df = pd.read_parquet("{{ outputs.download.uri }}")
      df.head()

About this blueprint

Python

This flow will:

Download a Parquet file from Databricks File System (DBFS) to Kestra's internal storage
Use the file in a Python script running in a container

Download

Script

Docker

More Related Blueprints

Python

Ingest Zendesk data into DuckDB using dlt

PythonGCP

Ingest Google Analytics data into a DuckDB database using dlt

PythonSQL

Fetch data from Couchbase and transform it with Pandas in Python

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra