
DeduplicateItems
Deduplicate a file by retaining only the latest item for each extracted key.
Deduplicate a file by retaining only the latest item for each extracted key.
Deduplicate a file by retaining only the latest item for each extracted key.
The Deduplicate task involves reading the input file twice, rather than loading the entire file into memory.
The first iteration is used to build a deduplication map in memory containing the last lines observed for each key.
The second iteration is used to rewrite the file without the duplicates. The task must be used with this in mind.
type: "io.kestra.plugin.core.storage.DeduplicateItems"Examples
id: "deduplicateitems"
type: "io.kestra.plugin.core.storage.DeduplicateItems"
tasks:
- id: deduplicate
type: io.kestra.plugin.core.storage.DeduplicateItems
from: "{{ inputs.uri }}"
expr: "{{ key }}"
Properties
expr*RequiredNon-dynamicstring
The Pebble expression to extract the deduplication key from each item
The 'pebble' expression can be used for constructing a composite key.
from*Requiredstring
The file to be deduplicated
Pebble expression referencing an Internal Storage URI e.g. {{ outputs.mytask.uri }}.
Outputs
droppedItemsTotalinteger
The total number of items that was dropped by the task
numKeysinteger
The number of distinct keys observed by the task
processedItemsTotalinteger
The total number of items that was processed by the task
uristring
uriThe deduplicated file URI