LoadFromGcs | Kestra

LoadFromGcs

Load GCS objects into BigQuery

yaml
type: "io.kestra.plugin.gcp.bigquery.LoadFromGcs"

Examples

yaml
id: gcp_bq_load_from_gcs
namespace: company.team

tasks:
  - id: http_download
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv

  - id: csv_to_ion
    type: io.kestra.plugin.serdes.csv.CsvToIon
    from: "{{ outputs.http_download.uri }}"
    header: true

  - id: ion_to_avro
    type: io.kestra.plugin.serdes.avro.IonToAvro
    from: "{{ outputs.csv_to_ion.uri }}"
    schema: |
      {
        "type": "record",
        "name": "Order",
        "namespace": "com.example.order",
        "fields": [
          {"name": "order_id", "type": "int"},
          {"name": "customer_name", "type": "string"},
          {"name": "customer_email", "type": "string"},
          {"name": "product_id", "type": "int"},
          {"name": "price", "type": "double"},
          {"name": "quantity", "type": "int"},
          {"name": "total", "type": "double"}
        ]
      }

  - id: load_from_gcs
    type: io.kestra.plugin.gcp.bigquery.LoadFromGcs
    from:
      - "{{ outputs.ion_to_avro.uri }}"
    destinationTable: "my_project.my_dataset.my_table"
    format: AVRO
    avroOptions:
      useAvroLogicalTypes: true

yaml
id: gcp_bq_load_files_test
namespace: company.team

tasks:
  - id: load_files_test
    type: io.kestra.plugin.gcp.bigquery.LoadFromGcs
    destinationTable: "myDataset.myTable"
    ignoreUnknownValues: true
    schema:
      fields:
        - name: colA
          type: STRING
        - name: colB
          type: NUMERIC
        - name: colC
          type: STRING
    format: CSV
    csvOptions:
      allowJaggedRows: true
      encoding: UTF-8
      fieldDelimiter: ","
    from:
      - gs://myBucket/myFile.csv

Properties

autodetect booleanstring

avroOptions

Definitions

io.kestra.plugin.gcp.bigquery.AbstractLoad-AvroOptions

useAvroLogicalTypesbooleanstring

clusteringFields array

SubTypestring

createDisposition string

Possible Values

CREATE_IF_NEEDEDCREATE_NEVER

csvOptions

Definitions

io.kestra.plugin.gcp.bigquery.AbstractLoad-CsvOptions

allowJaggedRowsbooleanstring

allowQuotedNewLinesbooleanstring

encodingstring

fieldDelimiterstring

quotestring

skipLeadingRowsintegerstring

destinationTable string

format string

Possible Values

CSVJSONAVROPARQUETORC

from array

SubTypestring

ignoreUnknownValues booleanstring

impersonatedServiceAccount string

location string

maxBadRecords integerstring

projectId string

retryAuto

Definitions

io.kestra.core.models.tasks.retrys.Constant

interval*string

Formatduration

type*object

behaviorstring

DefaultRETRY_FAILED_TASK

Possible Values

RETRY_FAILED_TASKCREATE_NEW_EXECUTION

maxAttemptsinteger

Minimum>= 1

maxDurationstring

Formatduration

warningOnRetryboolean

Defaultfalse

io.kestra.core.models.tasks.retrys.Exponential

interval*string

Formatduration

maxInterval*string

Formatduration

type*object

behaviorstring

DefaultRETRY_FAILED_TASK

Possible Values

RETRY_FAILED_TASKCREATE_NEW_EXECUTION

delayFactornumber

maxAttemptsinteger

Minimum>= 1

maxDurationstring

Formatduration

warningOnRetryboolean

Defaultfalse

io.kestra.core.models.tasks.retrys.Random

maxInterval*string

Formatduration

minInterval*string

Formatduration

type*object

behaviorstring

DefaultRETRY_FAILED_TASK

Possible Values

RETRY_FAILED_TASKCREATE_NEW_EXECUTION

maxAttemptsinteger

Minimum>= 1

maxDurationstring

Formatduration

warningOnRetryboolean

Defaultfalse

retryMessages array

SubTypestring

Default["due to concurrent update","Retrying the job may solve the problem","Retrying may solve the problem"]

retryReasons array

SubTypestring

Default["rateLimitExceeded","jobBackendError","backendError","internalError","jobInternalError"]

schema object

schemaUpdateOptions array

SubTypestring

Possible Values

ALLOW_FIELD_ADDITIONALLOW_FIELD_RELAXATION

scopes array

SubTypestring

Default["https://www.googleapis.com/auth/cloud-platform"]

serviceAccount string

timePartitioningField string

timePartitioningType string

DefaultDAY

Possible Values

DAYHOURMONTHYEAR

writeDisposition string

Possible Values

WRITE_TRUNCATEWRITE_TRUNCATE_DATAWRITE_APPENDWRITE_EMPTY

Outputs

destinationTable string

jobId string

rows integer

Metrics

bad.records counter

Unitrecords

duration timer

input.bytes counter

Unitbytes

input.files counter

Unitfiles

output.bytes counter

Unitbytes

output.rows counter

Unitrecords

Apache Cassandra

Tasks that integrate Apache Cassandra into Kestra workflows for querying and event-driven triggers.

Data

MongoDB

Tasks that query and manipulate MongoDB collections.

Data

Argo CD

GitOps-focused tasks that interact with Argo CD using the Argo CD CLI. Tasks are executed inside a container and rely on the official Argo CD CLI to perform application synchronization and status inspection.

Infrastructure

LoadFromGcs

Load GCS objects into BigQuery

More Plugins in this Category

Apache Cassandra

MongoDB

Argo CD

2.1.1

LoadFromGcs Load GCS objects into BigQuery

More Plugins in this Category

Apache Cassandra

MongoDB

Argo CD

2.1.1

LoadFromGcs

Load GCS objects into BigQuery