Split Split

yaml
type: "io.kestra.core.tasks.storages.Split"

Split a file from the Kestra's internal storage into multiple files.

Examples

Split a file by size.

yaml
id: "split"
type: "io.kestra.core.tasks.storages.Split"
from: "kestra://long/url/file1.txt"
bytes: 10MB

Split a file by rows count.

yaml
id: "split"
type: "io.kestra.core.tasks.storages.Split"
from: "kestra://long/url/file1.txt"
rows: 1000

Split a file in a defined number of partitions.

yaml
id: "split"
type: "io.kestra.core.tasks.storages.Split"
from: "kestra://long/url/file1.txt"
partitions: 8

Properties

from

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The file to be split.

bytes

  • Type: string
  • Dynamic: ✔️
  • Required:

Split a large file into multiple chunks with a maximum file size of bytes.

Can be provided as a string in the format "10MB" or "200KB", or the number of bytes. This allows you to process large files, slit them into smaller chunks by lines and process them in parallel. For example, MySQL by default limits the size of a query size to 16MB per query. Trying to use a bulk insert query with input data larger than 16MB will fail. Splitting the input data into smaller chunks is a common strategy to circumvent this limitation. By dividing a large data set into chunks smaller than the max_allowed_packet size (e.g., 10MB), you can insert the data in multiple smaller queries. This approach not only helps to avoid hitting the query size limit but can also be more efficient and manageable in terms of memory utilization, especially for very large datasets. In short, by splitting the file by bytes, you can bulk-insert smaller chunks of e.g. 10MB in parallel to avoid this limitation.

partitions

  • Type: integer
  • Dynamic: ✔️
  • Required:

Split a file into a fixed number of partitioned files. For example, if you have a file with 1000 lines and you set partitions to 10, the file will be split into 10 files with 100 lines each.

rows

  • Type: integer
  • Dynamic: ✔️
  • Required:

A number of rows per batch. The file will then be split into chunks with that maximum number of rows.

separator

  • Type: string
  • Dynamic:
  • Required:
  • Default: \n

The separator used to split a file into chunks. By default, it's a newline \n character. If you are on Windows, you might want to use \r\n instead.

Outputs

uris

  • Type: array
  • SubType: string
  • Dynamic:
  • Required:

The URIs of split files in the Kestra's internal storage.