Load

yaml

type: "io.kestra.plugin.gcp.bigquery.Load"

Load data from local file to BigQuery

Examples

Load an csv file from an input file

yaml

id: gcp_bq_load
namespace: company.team

tasks:
  - id: load
    type: io.kestra.plugin.gcp.bigquery.Load
    from: "{{ inputs.file }}"
    destinationTable: "my_project.my_dataset.my_table"
    format: CSV
    csvOptions:
      fieldDelimiter: ";"

Properties

`autodetect`

Type: boolean
Dynamic: ❌
Required: ❌

Experimental Automatic inference of the options and schema for CSV and JSON sources.

`avroOptions`

Type: AbstractLoad-AvroOptions
Dynamic: ❓
Required: ❌

Avro parsing options.

`clusteringFields`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌

The clustering specification for the destination table.

`createDisposition`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- CREATE_IF_NEEDED
- CREATE_NEVER

Whether the job is allowed to create tables.

`csvOptions`

Type: AbstractLoad-CsvOptions
Dynamic: ❓
Required: ❌

Csv parsing options.

`destinationTable`

Type: string
Dynamic: ✔️
Required: ❌

The table where to put query results.

If not provided, a new table is created.

`failedOnEmpty`

Type: boolean
Dynamic: ❌
Required: ❌
Default: true

Does the task will failed for an empty file

`format`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- CSV
- JSON
- AVRO
- PARQUET
- ORC

The source format, and possibly some parsing options, of the external data.

`from`

Type: string
Dynamic: ✔️
Required: ❌

The fully-qualified URIs that point to source data

`ignoreUnknownValues`

Type: boolean
Dynamic: ❌
Required: ❌

Whether BigQuery should allow extra values that are not represented in the table schema.

If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. By default unknown values are not allowed.

`location`

Type: string
Dynamic: ✔️
Required: ❌

The geographic location where the dataset should reside.

This property is experimental and might be subject to change or removed.

See Dataset Location

`maxBadRecords`

Type: integer
Dynamic: ❌
Required: ❌

The maximum number of bad records that BigQuery can ignore when running the job.

If the number of bad records exceeds this value, an invalid error is returned in the job result. By default, no bad record is ignored.

`projectId`

Type: string
Dynamic: ✔️
Required: ❌

The GCP project ID.

`retryAuto`

Type:
Dynamic: ❌
Required: ❌

`retryMessages`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [due to concurrent update, Retrying the job may solve the problem]

The messages which would trigger an automatic retry.

Message is tested as a substring of the full message, and is case insensitive.

`retryReasons`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [rateLimitExceeded, jobBackendError, internalError, jobInternalError]

The reasons which would trigger an automatic retry.

`schema`

Type: object
Dynamic: ❌
Required: ❌

The schema for the destination table.

The schema can be omitted if the destination table already exists, or if you're loading data from a Google Cloud Datastore backup (i.e. DATASTORE_BACKUP format option).

yaml

schema:
  fields:
    - name: colA
      type: STRING
    - name: colB
      type: NUMERIC

See type from StandardSQLTypeName

`schemaUpdateOptions`

Type: array
SubType: string
Dynamic: ❌
Required: ❌

Experimental Options allowing the schema of the destination table to be updated as a side effect of the query job.

Schema update options are supported in two cases: when writeDisposition is WRITE_APPEND; when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.

`scopes`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [https://www.googleapis.com/auth/cloud-platform]

The GCP scopes to be used.

`serviceAccount`

Type: string
Dynamic: ✔️
Required: ❌

The GCP service account key.

`timePartitioningField`

Type: string
Dynamic: ✔️
Required: ❌

The time partitioning field for the destination table.

`timePartitioningType`

Type: string
Dynamic: ✔️
Required: ❌
Default: DAY
Possible Values:
- DAY
- HOUR
- MONTH
- YEAR

The time partitioning type specification for the destination table.

`writeDisposition`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- WRITE_TRUNCATE
- WRITE_APPEND
- WRITE_EMPTY

The action that should occur if the destination table already exists.

Outputs

`destinationTable`

Type: string
Required: ❌

Destination table

`jobId`

Type: string
Required: ❌

The job id

`rows`

Type: integer
Required: ❌

Output rows count

Definitions

`io.kestra.core.models.tasks.retrys.Constant`

Properties

`interval`

Type: string
Dynamic: ❓
Required: ✔️
Format: duration

`type`

Type: string
Dynamic: ❓
Required: ✔️
Default: constant

`behavior`

Type: string
Dynamic: ❓
Required: ❌
Default: RETRY_FAILED_TASK
Possible Values:
- RETRY_FAILED_TASK
- CREATE_NEW_EXECUTION

`maxAttempt`

Type: integer
Dynamic: ❓
Required: ❌
Minimum: >= 1

`maxDuration`

Type: string
Dynamic: ❓
Required: ❌
Format: duration

`warningOnRetry`

Type: boolean
Dynamic: ❓
Required: ❌
Default: false

`io.kestra.core.models.tasks.retrys.Random`

Properties

`maxInterval`

Type: string
Dynamic: ❓
Required: ✔️
Format: duration

`minInterval`

Type: string
Dynamic: ❓
Required: ✔️
Format: duration

`type`

Type: string
Dynamic: ❓
Required: ✔️
Default: random

`behavior`

Type: string
Dynamic: ❓
Required: ❌
Default: RETRY_FAILED_TASK
Possible Values:
- RETRY_FAILED_TASK
- CREATE_NEW_EXECUTION

`maxAttempt`

Type: integer
Dynamic: ❓
Required: ❌
Minimum: >= 1

`maxDuration`

Type: string
Dynamic: ❓
Required: ❌
Format: duration

`warningOnRetry`

Type: boolean
Dynamic: ❓
Required: ❌
Default: false

`io.kestra.plugin.gcp.bigquery.AbstractLoad-CsvOptions`

Properties

`allowJaggedRows`

Type: boolean
Dynamic: ❌
Required: ❌

Whether BigQuery should accept rows that are missing trailing optional columns.

If true, BigQuery treats missing trailing columns as null values. If {@code false}, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. By default, rows with missing trailing columns are considered bad records.

`allowQuotedNewLines`

Type: boolean
Dynamic: ✔️
Required: ❌

Whether BigQuery should allow quoted data sections that contain newline characters in a CSV file.

By default quoted newline are not allowed.

`encoding`

Type: string
Dynamic: ✔️
Required: ❌

The character encoding of the data.

The supported values are UTF-8 or ISO-8859-1. The default value is UTF-8. BigQuery decodes the data after the raw, binary data has been split using the values set in {@link #setQuote(String)} and {@link #setFieldDelimiter(String)}.

`fieldDelimiter`

Type: string
Dynamic: ✔️
Required: ❌

The separator for fields in a CSV file.

BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (',').

`quote`

Type: string
Dynamic: ✔️
Required: ❌

The value that is used to quote data sections in a CSV file.

BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ('"'). If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set {@link #setAllowQuotedNewLines(boolean)} property to {@code true}.

`skipLeadingRows`

Type: integer
Dynamic: ❌
Required: ❌

The number of rows at the top of a CSV file that BigQuery will skip when reading the data

The default value is 0. This property is useful if you have header rows in the file that should be skipped.

`io.kestra.core.models.tasks.retrys.Exponential`

Properties

`interval`

Type: string
Dynamic: ❓
Required: ✔️
Format: duration

`maxInterval`

Type: string
Dynamic: ❓
Required: ✔️
Format: duration

`type`

Type: string
Dynamic: ❓
Required: ✔️
Default: exponential

`behavior`

Type: string
Dynamic: ❓
Required: ❌
Default: RETRY_FAILED_TASK
Possible Values:
- RETRY_FAILED_TASK
- CREATE_NEW_EXECUTION

`delayFactor`

Type: number
Dynamic: ❓
Required: ❌

`maxAttempt`

Type: integer
Dynamic: ❓
Required: ❌
Minimum: >= 1

`maxDuration`

Type: string
Dynamic: ❓
Required: ❌
Format: duration

`warningOnRetry`

Type: boolean
Dynamic: ❓
Required: ❌
Default: false

`io.kestra.plugin.gcp.bigquery.AbstractLoad-AvroOptions`

Properties

`useAvroLogicalTypes`

Type: boolean
Dynamic: ❌
Required: ❌

If format is set to AVRO, you can interpret logical types into their corresponding types (such as TIMESTAMP) instead of only using their raw types (such as INTEGER)

The value may be null.

Metrics

`bad.records`

Type: counter (records)

`duration`

Type: timer

`input.bytes`

Type: counter (bytes)

`input.files`

Type: counter (files)

`output.bytes`

Type: counter (bytes)

`output.rows`

Type: counter (records)

Was this page helpful?

​Load

Load

interval

type

behavior

maxAttempt

maxDuration

warningOnRetry

maxInterval

minInterval

type

behavior

maxAttempt

maxDuration

warningOnRetry

allowJaggedRows

allowQuotedNewLines

encoding

fieldDelimiter

quote

skipLeadingRows

interval

maxInterval

type

behavior

delayFactor

maxAttempt

maxDuration

warningOnRetry

useAvroLogicalTypes

Load

`interval`

`type`

`behavior`

`maxAttempt`

`maxDuration`

`warningOnRetry`

`maxInterval`

`minInterval`

`type`

`behavior`

`maxAttempt`

`maxDuration`

`warningOnRetry`

`allowJaggedRows`

`allowQuotedNewLines`

`encoding`

`fieldDelimiter`

`quote`

`skipLeadingRows`

`interval`

`maxInterval`

`type`

`behavior`

`delayFactor`

`maxAttempt`

`maxDuration`

`warningOnRetry`

`useAvroLogicalTypes`