Query

yaml

type: "io.kestra.plugin.gcp.bigquery.Query"

Execute BigQuery SQL query in a specific BigQuery database.

Examples

Create a table with a custom query.

yaml

id: gcp_bq_query
namespace: company.team

tasks:
  - id: query
    type: io.kestra.plugin.gcp.bigquery.Query
    destinationTable: "my_project.my_dataset.my_table"
    writeDisposition: WRITE_APPEND
    sql: |
      SELECT
        "hello" as string,
        NULL AS `nullable`,
        1 as int,
        1.25 AS float,
        DATE("2008-12-25") AS date,
        DATETIME "2008-12-25 15:30:00.123456" AS datetime,
        TIME(DATETIME "2008-12-25 15:30:00.123456") AS time,
        TIMESTAMP("2008-12-25 15:30:00.123456") AS timestamp,
        ST_GEOGPOINT(50.6833, 2.9) AS geopoint,
        ARRAY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS `array`,
        STRUCT(4 AS x, 0 AS y, ARRAY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS z) AS `struct`

Execute a query and fetch results sets on another task.

yaml

id: gcp_bq_query
namespace: company.team

tasks:
  - id: fetch
    type: io.kestra.plugin.gcp.bigquery.Query
    fetch: true
    sql: |
      SELECT 1 as id, "John" as name
      UNION ALL
      SELECT 2 as id, "Doe" as name
  - id: use_fetched_data
    type: io.kestra.plugin.core.debug.Return
    format: |
      {% for row in outputs.fetch.rows %}
      id : {{ row.id }}, name: {{ row.name }}
      {% endfor %}

Properties

`allowLargeResults`

Type: boolean
Dynamic: ❌
Required: ❌

Sets whether the job is enabled to create arbitrarily large results.

If true the query is allowed to create large results at a slight cost in performance. destinationTable must be provided.

`clusteringFields`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌

The clustering specification for the destination table.

`createDisposition`

Type: string
Dynamic: ❌
Required: ❌
Possible Values:
- CREATE_IF_NEEDED
- CREATE_NEVER

Whether the job is allowed to create tables.

`defaultDataset`

Type: string
Dynamic: ✔️
Required: ❌

Sets the default dataset.

This dataset is used for all unqualified table names used in the query.

`destinationTable`

Type: string
Dynamic: ✔️
Required: ❌

The table where to put query results.

If not provided, a new table is created.

`dryRun`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Whether the job has to be dry run or not.

A valid query will mostly return an empty response with some processing statistics, while an invalid query will return the same error as it would if it were an actual run.

`fetch`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Whether to Fetch the data from the query result to the task output

`fetchOne`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Whether to Fetch only one data row from the query result to the task output

`flattenResults`

Type: boolean
Dynamic: ❌
Required: ❌
Default: true

Sets whether nested and repeated fields should be flattened.

If set to false, allowLargeResults must be true.

`jobTimeout`

Type: string
Dynamic: ❌
Required: ❌
Format: duration

Job timeout.

If this time limit is exceeded, BigQuery may attempt to terminate the job.

`labels`

Type: object
SubType: string
Dynamic: ✔️
Required: ❌

The labels associated with this job.

You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

`legacySql`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Whether to use BigQuery's legacy SQL dialect for this query

By default this property is set to false.

`location`

Type: string
Dynamic: ✔️
Required: ❌

The geographic location where the dataset should reside.

This property is experimental and might be subject to change or removed.

See Dataset Location

`maxResults`

Type: integer
Dynamic: ❌
Required: ❌

This is only supported in the fast query path.

The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

`maximumBillingTier`

Type: integer
Dynamic: ❌
Required: ❌

Limits the billing tier for this job.

Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default.

`maximumBytesBilled`

Type: integer
Dynamic: ❌
Required: ❌

Limits the bytes billed for this job.

Queries that will have bytes billed beyond this limit will fail (without incurring a charge). If unspecified, this will be set to your project default.

`priority`

Type: string
Dynamic: ❌
Required: ❌
Default: INTERACTIVE
Possible Values:
- INTERACTIVE
- BATCH

Sets a priority for the query.

`projectId`

Type: string
Dynamic: ✔️
Required: ❌

The GCP project ID.

`rangePartitioningEnd`

Type: integer
Dynamic: ✔️
Required: ❌

The end range partitioning, inclusive.

`rangePartitioningField`

Type: string
Dynamic: ✔️
Required: ❌

Range partitioning field for the destination table.

`rangePartitioningInterval`

Type: integer
Dynamic: ✔️
Required: ❌

The width of each interval.

`rangePartitioningStart`

Type: integer
Dynamic: ✔️
Required: ❌

The start of range partitioning, inclusive.

`retryAuto`

Type:
Dynamic: ❌
Required: ❌

`retryMessages`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [due to concurrent update, Retrying the job may solve the problem]

The messages which would trigger an automatic retry.

Message is tested as a substring of the full message, and is case insensitive.

`retryReasons`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [rateLimitExceeded, jobBackendError, internalError, jobInternalError]

The reasons which would trigger an automatic retry.

`schemaUpdateOptions`

Type: array
SubType: string
Dynamic: ❌
Required: ❌

Experimental Options allowing the schema of the destination table to be updated as a side effect of the query job.

Schema update options are supported in two cases: * when writeDisposition is WRITE_APPEND;

when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.

`scopes`

Type: array
SubType: string
Dynamic: ✔️
Required: ❌
Default: [https://www.googleapis.com/auth/cloud-platform]

The GCP scopes to be used.

`serviceAccount`

Type: string
Dynamic: ✔️
Required: ❌

The GCP service account key.

`sql`

Type: string
Dynamic: ✔️
Required: ❌

The sql query to run

`store`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Whether to store the data from the query result into an ion serialized data file

`timePartitioningField`

Type: string
Dynamic: ✔️
Required: ❌

The time partitioning field for the destination table.

`timePartitioningType`

Type: string
Dynamic: ✔️
Required: ❌
Default: DAY
Possible Values:
- DAY
- HOUR
- MONTH
- YEAR

The time partitioning type specification.

`useLegacySql`

Type: boolean
Dynamic: ❌
Required: ❌
Default: false

Sets whether to use BigQuery's legacy SQL dialect for this query.

A valid query will return a mostly empty response with some processing statistics, while an invalid query will return the same error it would if it wasn't a dry run.

`useQueryCache`

Type: boolean
Dynamic: ❌
Required: ❌

Sets whether to look for the result in the query cache.

The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. Moreover, the query cache is only available when destinationTable is not set