ExtractToGcs
Extract data from a BigQuery table to GCS.
type: "io.kestra.plugin.gcp.bigquery.ExtractToGcs"
Examples
Extract a BigQuery table to a GCS bucket.
id: gcp_bq_extract_to_gcs
namespace: company.team
tasks:
- id: extract_to_gcs
type: io.kestra.plugin.gcp.bigquery.ExtractToGcs
destinationUris:
- "gs://bucket_name/filename.csv"
sourceTable: "my_project.my_dataset.my_table"
format: CSV
fieldDelimiter: ';'
printHeader: true
Properties
compression string
the compression value to use for exported files. If not set exported files are not compressed.
destinationUris array
The list of fully-qualified Google Cloud Storage URIs (e.g. gs://bucket/path) where the extracted table should be written.
fieldDelimiter string
The delimiter to use between fields in the exported data. By default "," is used.
format string
The exported file format. If not set table is exported in CSV format.
impersonatedServiceAccount string
The GCP service account to impersonate.
jobTimeoutMs integerstring
Optional Job timeout in milliseconds. If this time limit is exceeded, BigQuery may attempt to terminate the job.
labels object
The labels associated with this job.
The labels associated with this job. You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key. Parameters: labels - labels or null for none
location string
The geographic location where the dataset should reside.
This property is experimental and might be subject to change or removed.
See Dataset Location
printHeader booleanstring
Whether to print out a header row in the results. By default an header is printed.
projectId string
The GCP project ID.
retryAuto Non-dynamicConstantExponentialRandom
Automatic retry for retryable BigQuery exceptions.
Some exceptions (especially rate limit) are not retried by default by BigQuery client, we use by default a transparent retry (not the kestra one) to handle this case. The default values are exponential of 5 seconds for a maximum of 15 minutes and ten attempts
retryMessages array
["due to concurrent update","Retrying the job may solve the problem","Retrying may solve the problem"]
The messages which would trigger an automatic retry.
Message is tested as a substring of the full message, and is case insensitive.
retryReasons array
["rateLimitExceeded","jobBackendError","backendError","internalError","jobInternalError"]
The reasons which would trigger an automatic retry.
scopes array
["https://www.googleapis.com/auth/cloud-platform"]
The GCP scopes to be used.
serviceAccount string
The GCP service account.
sourceTable string
The table to export.
useAvroLogicalTypes booleanstring
Optional Flag if format is set to "AVRO".
Optional If destinationFormat is set to "AVRO", this flag indicates whether to enable extracting applicable column types (such as TIMESTAMP) to their corresponding AVRO logical types (timestamp-micros), instead of only using their raw types (avro-long).
Outputs
destinationUris array
The destination URI file
fileCounts array
Number of extracted files
jobId string
The job id
sourceTable string
source Table
Metrics
duration timer
The time it took for the job to run.
output.file_counts counter
The number of files extracted to GCS.
Definitions
io.kestra.core.models.tasks.retrys.Constant
interval *Requiredstring
duration
type *Requiredobject
behavior string
RETRY_FAILED_TASK
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
maxAttempts integer
>= 1
maxDuration string
duration
warningOnRetry boolean
false
io.kestra.core.models.tasks.retrys.Random
maxInterval *Requiredstring
duration
minInterval *Requiredstring
duration
type *Requiredobject
behavior string
RETRY_FAILED_TASK
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
maxAttempts integer
>= 1
maxDuration string
duration
warningOnRetry boolean
false
io.kestra.core.models.tasks.retrys.Exponential
interval *Requiredstring
duration
maxInterval *Requiredstring
duration
type *Requiredobject
behavior string
RETRY_FAILED_TASK
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
delayFactor number
maxAttempts integer
>= 1
maxDuration string
duration
warningOnRetry boolean
false