IonToAvro IonToAvro

yaml
type: "io.kestra.plugin.serdes.avro.IonToAvro"

Read a provided file containing ion serialized data and convert it to avro.

Examples

Convert a CSV file to the Parquet format using Avro schema

yaml
id: divvy_tripdata
namespace: dev

variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"

tasks:
  - id: get_zipfile
    type: io.kestra.plugin.fs.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"

  - id: convert
    type: io.kestra.plugin.serdes.csv.CsvToIon
    from: "{{outputs.unzip.files[render(vars.file_id) ~ '-divvy-tripdata.csv']}}"

  - id: to_parquet
    type: io.kestra.plugin.serdes.avro.AvroWriter
    from: "{{ outputs.convert.uri }}"
    datetimeFormat: "yyyy-MM-dd' 'HH:mm:ss"
    schema: |
      {
        "type": "record",
        "name": "Ride",
        "namespace": "com.example.bikeshare",
        "fields": [
          {"name": "ride_id", "type": "string"},
          {"name": "rideable_type", "type": "string"},
          {"name": "started_at", "type": {"type": "long", "logicalType": "timestamp-millis"}},
          {"name": "ended_at", "type": {"type": "long", "logicalType": "timestamp-millis"}},
          {"name": "start_station_name", "type": "string"},
          {"name": "start_station_id", "type": "string"},
          {"name": "end_station_name", "type": "string"},
          {"name": "end_station_id", "type": "string"},
          {"name": "start_lat", "type": "double"},
          {"name": "start_lng", "type": "double"},
          {
            "name": "end_lat",
            "type": ["null", "double"],
            "default": null
          },
          {
            "name": "end_lng",
            "type": ["null", "double"],
            "default": null
          },
          {"name": "member_casual", "type": "string"}
        ]
      }

Properties

from

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

Source file URI

schema

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The avro schema associated to the data

dateFormat

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: yyyy-MM-dd[XXX]

Format to use when parsing date

datetimeFormat

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: yyyy-MM-dd'T'HH:mm[:ss][.SSSSSS][XXX]

Format to use when parsing datetime

Default value is yyyy-MM-dd'T'HH:mm[][.SSSSSS]XXX

decimalSeparator

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: .

Character to recognize as decimal point (e.g. use ‘,’ for European data).

Default value is '.'

falseValues

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [f, false, disabled, 0, off, no, ]

Values to consider as False

inferAllFields

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: false

Try to infer all fields

If true, we try to infer all fields with trueValues, trueValues & nullValues.If false, we will infer bool & null only on field declared on schema as null and bool.

nullValues

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [, #N/A, #N/A N/A, #NA, -1.#IND, -1.#QNAN, -NaN, 1.#IND, 1.#QNAN, NA, n/a, nan, null]

Values to consider as null

strictSchema

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: false

Whether to consider a field present in the data but not declared in the schema as an error

Default value is false

timeFormat

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: HH:mm[:ss][.SSSSSS][XXX]

Format to use when parsing time

timeZoneId

  • Type: string
  • Dynamic:
  • Required:
  • Default: Etc/UTC

Timezone to use when no timezone can be parsed on the source.

If null, the timezone will be UTC Default value is system timezone

trueValues

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [t, true, enabled, 1, on, yes]

Values to consider as True

Outputs

uri

  • Type: string
  • Dynamic:
  • Required:
  • Format: uri

URI of a temporary result file

Definitions

Was this page helpful?