Batch Batch

yaml
type: "io.kestra.plugin.gcp.runner.Batch"

Task runner that executes a task inside a job in Google Cloud Batch.

This task runner is container-based so the containerImage property must be set. You need to have roles 'Batch Job Editor' and 'Logs Viewer' to be able to use it.

To access the task's working directory, use the {{ workingDir }} Pebble expression or the WORKING_DIR environment variable. Input files and namespace files will be available in this directory.

To generate output files you can either use the outputFiles task's property and create a file with the same name in the task's working directory, or create any file in the output directory which can be accessed by the {{ outputDir }} Pebble expression or the OUTPUT_DIR environment variables.

To use inputFiles, outputFiles or namespaceFiles properties, make sure to set the bucket property. The bucket serves as an intermediary storage layer for the task runner. Input and namespace files will be uploaded to the cloud storage bucket before the task run. Similarly, the task runner will store outputFiles in this bucket during the task run. In the end, the task runner will make those files available for download and preview from the UI by sending them to internal storage. To make it easier to track where all files are stored, the task runner will generate a folder for each task run. You can access that folder using the {{ bucketPath }} Pebble expression or the BUCKET_PATH environment variable.

Warning, contrarily to other task runners, this task runner didn't run the task in the working directory but in the root directory. You must use the {{ workingDir }} Pebble expression or the WORKING_DIR environment variable to access files.

Note that when the Kestra Worker running this task is terminated, the batch job will still runs until completion, then after restarting, the Worker will resume processing on the existing job unless resume is set to false.

Examples

Execute a Shell command.

yaml
id: new-shell
namespace: company.team

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    taskRunner:
      type: io.kestra.plugin.gcp.runner.Batch
      projectId: "{{vars.projectId}}"
      region: "{{vars.region}}"
    commands:
      - echo "Hello World"

Pass input files to the task, execute a Shell command, then retrieve output files.

yaml
id: new-shell-with-file
namespace: company.team

inputs:
  - id: file
    type: FILE

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    inputFiles:
      data.txt: "{{inputs.file}}"
    outputFiles:
      - out.txt
    containerImage: centos
    taskRunner:
      type: io.kestra.plugin.gcp.runner.Batch
      projectId: "{{vars.projectId}}"
      region: "{{vars.region}}"
      bucket: "{{vars.bucker}}"
    commands:
      - cp {{workingDir}}/data.txt {{workingDir}}/out.txt

Properties

delete

  • Type: boolean
  • Dynamic:
  • Required: ✔️
  • Default: true

Whether the job should be deleted upon completion.

machineType

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️
  • Default: e2-medium

The GCP machine type.

See https://cloud.google.com/compute/docs/machine-types

region

  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

The GCP region.

resume

  • Type: boolean
  • Dynamic:
  • Required: ✔️
  • Default: true

Whether to reconnect to the current job if it already exists.

bucket

  • Type: string
  • Dynamic: ✔️
  • Required:

Google Cloud Storage Bucket to use to upload (inputFiles and namespaceFiles) and download (outputFiles) files.

It's mandatory to provide a bucket if you want to use such properties.

completionCheckInterval

  • Type: string
  • Dynamic:
  • Required:
  • Default: 5.000000000
  • Format: duration

Determines how often Kestra should poll the container for completion. By default, the task runner checks every 5 seconds whether the job is completed. You can set this to a lower value (e.g. PT0.1S = every 100 milliseconds) for quick jobs and to a lower threshold (e.g. PT1M = every minute) for long-running jobs. Setting this property to a lower value will reduce the number of API calls Kestra makes to the remote service — keep that in mind in case you see API rate limit errors.

computeResource

Compute resource requirements.

ComputeResource defines the amount of resources required for each task. Make sure your tasks have enough compute resources to successfully run. If you also define the types of resources for a job to use with the InstancePolicyOrTemplate field, make sure both fields are compatible with each other.

entryPoint

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:

Container entrypoint to use.

networkInterfaces

Network interfaces.

projectId

  • Type: string
  • Dynamic: ✔️
  • Required:

The GCP project ID.

reservation

  • Type: string
  • Dynamic: ✔️
  • Required:

Compute reservation.

scopes

  • Type: array
  • SubType: string
  • Dynamic: ✔️
  • Required:
  • Default: [https://www.googleapis.com/auth/cloud-platform]

The GCP scopes to be used.

serviceAccount

  • Type: string
  • Dynamic: ✔️
  • Required:

The GCP service account key.

waitForLogInterval

  • Type: string
  • Dynamic:
  • Required:
  • Default: 5.000000000
  • Format: duration

Additional time after the job ends to wait for late logs.

waitUntilCompletion

  • Type: string
  • Dynamic:
  • Required:
  • Default: 3600.000000000
  • Format: duration

The maximum duration to wait for the job completion unless the task timeout property is set which will take precedence over this property.

Google Cloud Batch will automatically timeout the job upon reaching such duration and the task will be failed.

Outputs

Definitions

io.kestra.plugin.gcp.runner.Batch-NetworkInterface

Properties

network
  • Type: string
  • Dynamic: ✔️
  • Required: ✔️

Network identifier with the format projects/HOST_PROJECT_ID/global/networks/NETWORK.

subnetwork
  • Type: string
  • Dynamic: ✔️
  • Required:

Subnetwork identifier in the format projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNET

io.kestra.plugin.gcp.runner.Batch-ComputeResource

Properties

bootDisk
  • Type: string
  • Dynamic:
  • Required:

Extra boot disk size for each task.

cpu
  • Type: string
  • Dynamic:
  • Required:

The milliCPU count.

Defines the amount of CPU resources per task in milliCPU units. For example, 1000 corresponds to 1 vCPU per task. If undefined, the default value is 2000. If you also define the VM's machine type using the machineType property in InstancePolicy field or inside the instanceTemplate in the InstancePolicyOrTemplate field, make sure the CPU resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.

For example, if you specify the n2-standard-2 machine type, which has 2 vCPUs, you can set the cpu to no more than 2000. Alternatively, you can run two tasks on the same VM if you set the cpu to 1000 or less.

memory
  • Type: string
  • Dynamic:
  • Required:

Memory in MiB.

Defines the amount of memory per task in MiB units. If undefined, the default value is 2GB. If you also define the VM's machine type using the machineType in InstancePolicy field or inside the instanceTemplate in the InstancePolicyOrTemplate field, make sure the memory resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.

For example, if you specify the n2-standard-2 machine type, which has 8 GiB of memory, you can set the memory to no more than 8GB. Alternatively, you can run two tasks on the same VM if you set the memory to 4GB or less.

Was this page helpful?