ParquetWriter ParquetWriter

yaml
type: "io.kestra.plugin.serdes.parquet.ParquetWriter"

Read a provided file containing ion serialized data and convert it to parquet.

Properties

from

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

Source file URI

schema

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The avro schema associated to the data

compressionCodec

  • Type: string
  • Dynamic:
  • Required:
  • Default: GZIP
  • Possible Values:
    • UNCOMPRESSED
    • SNAPPY
    • GZIP
    • ZSTD

The compression to used

dateFormat

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: yyyy-MM-dd[XXX]

Format to use when parsing date

datetimeFormat

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: yyyy-MM-dd'T'HH:mm[:ss][.SSSSSS][XXX]

Format to use when parsing datetime

Default value is yyyy-MM-dd'T'HH:mm[][.SSSSSS]XXX

decimalSeparator

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: .

Character to recognize as decimal point (e.g. use ‘,’ for European data).

Default value is '.'

dictionaryPageSize

  • Type: integer
  • Dynamic:
  • Required:
  • Default: 1048576

Max dictionary page size

falseValues

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [f, false, disabled, 0, off, no, ]

Values to consider as False

inferAllFields

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: false

Try to infer all fields

If true, we try to infer all fields with trueValues, trueValues & nullValues.If false, we will infer bool & null only on field declared on schema as null and bool.

nullValues

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [, #N/A, #N/A N/A, #NA, -1.#IND, -1.#QNAN, -NaN, 1.#IND, 1.#QNAN, NA, n/a, nan, null]

Values to consider as null

pageSize

  • Type: integer
  • Dynamic:
  • Required:
  • Default: 1048576

Target page size

rowGroupSize

  • Type: integer
  • Dynamic:
  • Required:
  • Default: 134217728

Target row group size

strictSchema

  • Type: boolean
  • Dynamic:
  • Required:
  • Default: false

Whether to consider a field present in the data but not declared in the schema as an error

Default value is false

timeFormat

  • Type: string
  • Dynamic: ✔️
  • Required:
  • Default: HH:mm[:ss][.SSSSSS][XXX]

Format to use when parsing time

timeZoneId

  • Type: string
  • Dynamic:
  • Required:
  • Default: Europe/Paris

Timezone to use when no timezone can be parsed on the source.

If null, the timezone will be UTC Default value is system timezone

trueValues

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [t, true, enabled, 1, on, yes]

Values to consider as True

version

  • Type: string
  • Dynamic:
  • Required:
  • Default: V2
  • Possible Values:
    • V1
    • V2

Target row group size

Outputs

uri

  • Type: string
  • Dynamic:
  • Required:
  • Format: uri

URI of a temporary result file