CreateDatasetCreateDataset
​Create​DatasetCertified

Create a BigQuery dataset or update if it already exists.

yaml
type: "io.kestra.plugin.gcp.bigquery.CreateDataset"

Create a dataset if not exits

yaml
id: gcp_bq_create_dataset
namespace: company.team

tasks:
  - id: create_dataset
    type: io.kestra.plugin.gcp.bigquery.CreateDataset
    name: "my_dataset"
    location: "EU"
    ifExists: "SKIP"
Properties

The dataset's access control configuration.

Definitions
entity*Required

The GCP entity.

type*Requiredstring
Possible Values
DOMAINGROUPUSERIAM_MEMBER

The type of the entity (USER, GROUP, DOMAIN or IAM_MEMBER).

value*Requiredstring

The value for the entity.

For example, user email if the type is USER.

role*Requiredstring
Possible Values
READERWRITEROWNER

The role to assign to the entity.

The default encryption key for all tables in the dataset.

Once this property is set, all newly-created partitioned tables in the dataset will have encryption key set to this value, unless table creation request (or query) overrides the key.

Definitions
kmsKeyNamestring

Optional The default partition expiration time for all partitioned tables in the dataset, in milliseconds.

Once this property is set, all newly-created partitioned tables in the dataset will has an expirationMs property in the timePartitioning settings set to this value. Changing the value only affect new tables, not existing ones. The storage in a partition will have an expiration time of its partition time plus this value. Setting this property overrides the use of defaultTableExpirationMs for partitioned tables: only one of defaultTableExpirationMs and defaultPartitionExpirationMs will be used for any new partitioned table. If you provide an explicit timePartitioning.expirationMs when creating or updating a partitioned table, that value takes precedence over the default partition expiration time indicated by this property. The value may be null.

The default lifetime of all tables in the dataset, in milliseconds.

The minimum value is 3600000 milliseconds (one hour). Once this property is set, all newly-created tables in the dataset will have an expirationTime property set to the creation time plus the value in this property, and changing the value will only affect new tables, not existing ones. When the expirationTime for a given table is reached, that table will be deleted automatically. If a table's expirationTime is modified or removed before the table expires, or if you provide an explicit expirationTime when creating a table, that value takes precedence over the default expiration time indicated by this property. This property is experimental and might be subject to change or removed.

The dataset description.

DefaultERROR
Possible Values
ERRORUPDATESKIP

Policy to apply if a dataset already exists.

The GCP service account to impersonate.

SubTypestring

The dataset's labels.

Automatic retry for retryable BigQuery exceptions.

Some exceptions (especially rate limit) are not retried by default by BigQuery client, we use by default a transparent retry (not the kestra one) to handle this case. The default values are exponential of 5 seconds for a maximum of 15 minutes and ten attempts

Definitions
interval*Requiredstring
Formatduration
type*Requiredobject
behaviorstring
DefaultRETRY_FAILED_TASK
Possible Values
RETRY_FAILED_TASKCREATE_NEW_EXECUTION
maxAttemptsinteger
Minimum>= 1
maxDurationstring
Formatduration
warningOnRetryboolean
Defaultfalse
interval*Requiredstring
Formatduration
maxInterval*Requiredstring
Formatduration
type*Requiredobject
behaviorstring
DefaultRETRY_FAILED_TASK
Possible Values
RETRY_FAILED_TASKCREATE_NEW_EXECUTION
delayFactornumber
maxAttemptsinteger
Minimum>= 1
maxDurationstring
Formatduration
warningOnRetryboolean
Defaultfalse
maxInterval*Requiredstring
Formatduration
minInterval*Requiredstring
Formatduration
type*Requiredobject
behaviorstring
DefaultRETRY_FAILED_TASK
Possible Values
RETRY_FAILED_TASKCREATE_NEW_EXECUTION
maxAttemptsinteger
Minimum>= 1
maxDurationstring
Formatduration
warningOnRetryboolean
Defaultfalse
SubTypestring
Default["due to concurrent update","Retrying the job may solve the problem","Retrying may solve the problem"]

The messages which would trigger an automatic retry.

Message is tested as a substring of the full message, and is case insensitive.

SubTypestring
Default["rateLimitExceeded","jobBackendError","backendError","internalError","jobInternalError"]

The reasons which would trigger an automatic retry.

SubTypestring
Default["https://www.googleapis.com/auth/cloud-platform"]

The GCP scopes to be used.

The GCP service account.

The dataset's user-defined ID.

A user-friendly description for the dataset.

A user-friendly name for the dataset.

The geographic location where the dataset should reside.

This property is experimental and might be subject to change or removed. See Dataset Location

The GCP project ID.