IonToParquet IonToParquet
IonToParquet Certified

yaml
type: "io.kestra.plugin.serdes.parquet.IonToParquet"
yaml
id: ion_to_parquet
namespace: company.team

tasks:
  - id: download_csv
    type: io.kestra.plugin.core.http.Download
    description: salaries of data professionals from 2020 to 2023 (source ai-jobs.net)
    uri: https://huggingface.co/datasets/kestra/datasets/raw/main/csv/salaries.csv

  - id: avg_salary_by_job_title
    type: io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      data.csv: "{{ outputs.download_csv.uri }}"
    sql: |
      SELECT
        job_title,
        ROUND(AVG(salary),2) AS avg_salary
      FROM read_csv_auto('{{ workingDir }}/data.csv', header=True)
      GROUP BY job_title
      HAVING COUNT(job_title) > 10
      ORDER BY avg_salary DESC;
    store: true

  - id: result
    type: io.kestra.plugin.serdes.parquet.IonToParquet
    from: "{{ outputs.avg_salary_by_job_title.uri }}"
    schema: |
      {
        "type": "record",
        "name": "Salary",
        "namespace": "com.example.salary",
        "fields": [
          {"name": "job_title", "type": "string"},
          {"name": "avg_salary", "type": "double"}
        ]
      }
Properties
DefaultGZIP
Possible Values
UNCOMPRESSEDSNAPPYGZIPZSTD
Defaultyyyy-MM-dd[XXX]
Defaultyyyy-MM-dd'T'HH:mm[:ss][.SSSSSS][XXX]
Default.
Default1048576
SubTypestring
Default["f","false","disabled","0","off","no",""]
Defaultfalse
SubTypestring
Default["","#N/A","#N/A N/A","#NA","-1.#IND","-1.#QNAN","-NaN","1.#IND","1.#QNAN","NA","n/a","nan","null"]
Default100
DefaultERROR
Possible Values
ERRORWARNSKIP
Default1048576
DefaultV2
Possible Values
V1V2
Default134217728
Defaultfalse
DefaultHH:mm[:ss][.SSSSSS][XXX]
DefaultEtc/UTC
SubTypestring
Default["t","true","enabled","1","on","yes"]
Formaturi