batch

Task runner that executes a task inside a job in Google Cloud Batch.

This plugin is only available in the Enterprise Edition (EE).

This task runner is container-based so the containerImage property must be set. You need to have roles 'Batch Job Editor' (roles/batch.jobsEditor) and 'Logs Viewer' (roles/logging.viewer) to be able to use it.

To access the task's working directory, use the {{workingDir}} Pebble expression or the WORKING_DIR environment variable. Input files and namespace files will be available in this directory.

To generate output files you can either use the outputFiles task's property and create a file with the same name in the task's working directory, or create any file in the output directory which can be accessed by the {{outputDir}} Pebble expression or the OUTPUT_DIR environment variables.

To use inputFiles, outputFiles or namespaceFiles properties, make sure to set the bucket property. The bucket serves as an intermediary storage layer for the task runner. Input and namespace files will be uploaded to the cloud storage bucket before the task run. Similarly, the task runner will store outputFiles in this bucket during the task run. In the end, the task runner will make those files available for download and preview from the UI by sending them to internal storage.

The task runner will generate a folder in the configured bucket for each task run. You can access that folder using the {{bucketPath}} Pebble expression or the BUCKET_PATH environment variable.

Warning, contrarily to other task runners, this task runner didn't run the task in the working directory but in the root directory. You must use the {{workingDir}} Pebble expression or the WORKING_DIR environment variable to access files.

Note that when the Kestra Worker running this task is terminated, the batch job will still runs until completion, then after restarting, the Worker will resume processing on the existing job unless resume is set to false.

yaml
type: "io.kestra.plugin.ee.gcp.runner.batch"

Examples

Execute a Shell command.

yaml
id: new_shell
namespace: company.team

variables:
  projectId: "myproject"
  region: "europe-west2"

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      projectId: "{{vars.projectId}}"
      region: "{{ vars.region} }"
      serviceAccount: "{{ secret('GOOGLE_SA') }}"
    commands:
      - echo "Hello World"

Pass input files to the task, execute a Shell command, then retrieve output files.

yaml
id: new_shell_with_file
namespace: company.team

inputs:
  - id: file
    type: FILE

variables:
  projectId: "myProject"
  region: "europe-west2"
  bucket: "myBucket"

tasks:
  - id: shell
    type: io.kestra.plugin.scripts.shell.Commands
    inputFiles:
      data.txt: "{{ inputs.file }}"
    outputFiles:
      - out.txt
    containerImage: centos
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      projectId: "{{ vars.projectId }}"
      region: "{{ vars.region }}"
      bucket: "{{ vars.bucket }}"
      serviceAccount: "{{ secret('GOOGLE_SA') }}"
    commands:
      - cp {{workingDir}}/data.txt {{workingDir}}/out.txt

Run a Python script to fetch environment information on Google Cloud with Google Batch

yaml
id: gcp_batch_runner
namespace: company.team

tasks:
  - id: scrape_environment_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      projectId: "{{ secret('GCP_PROJECT_ID') }}"
      region: "europe-west9"
      bucket: "{{ secret('GCS_BUCKET')}}"
      serviceAccount: "{{ secret('GOOGLE_SA') }}"
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - environment_info.json
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json
        from kestra import Kestra

        print("Hello from GCP Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = '{{ workingDir }}/environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        if __name__ == '__main__':
          print_environment_info()

Properties

region*string

The GCP region.

bucketstring

Google Cloud Storage Bucket to use to upload (inputFiles and namespaceFiles) and download (outputFiles) files.

It's mandatory to provide a bucket if you want to use such properties.

completionCheckIntervalstring

DefaultPT5S

Formatduration

Determines how often Kestra should poll the container for completion. By default, the task runner checks every 5 seconds whether the job is completed. You can set this to a lower value (e.g. PT0.1S = every 100 milliseconds) for quick jobs and to a lower threshold (e.g. PT1M = every minute) for long-running jobs. Setting this property to a lower value will reduce the number of API calls Kestra makes to the remote service — keep that in mind in case you see API rate limit errors.

computeResource

Compute resource requirements.

ComputeResource defines the amount of resources required for each task. Make sure your tasks have enough compute resources to successfully run. If you also define the types of resources for a job to use with the InstancePolicyOrTemplate field, make sure both fields are compatible with each other.

Definitions

io.kestra.plugin.ee.gcp.runner.Batch-ComputeResource

bootDiskstring

Extra boot disk size for each task.

cpustring

The milliCPU count.

Defines the amount of CPU resources per task in milliCPU units. For example, 1000 corresponds to 1 vCPU per task. If undefined, the default value is 2000. If you also define the VM's machine type using the machineType property in InstancePolicy field or inside the instanceTemplate in the InstancePolicyOrTemplate field, make sure the CPU resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.

For example, if you specify the n2-standard-2 machine type, which has 2 vCPUs, you can set the cpu to no more than 2000. Alternatively, you can run two tasks on the same VM if you set the cpu to 1000 or less.

memorystring

Memory in MiB.

Defines the amount of memory per task in MiB units. If undefined, the default value is 2048. If you also define the VM's machine type using the machineType in InstancePolicy field or inside the instanceTemplate in the InstancePolicyOrTemplate field, make sure the memory resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.

For example, if you specify the n2-standard-2 machine type, which has 8 GiB of memory, you can set the memory to no more than 8192.

deletebooleanstring

Defaulttrue

Whether the job should be deleted upon completion.

Warning, if the job is not deleted, a retry of the task could resume an old failed attempt of the job.

entryPointarray

SubTypestring

Container entrypoint to use.

impersonatedServiceAccountstring

The GCP service account to impersonate.

lifecyclePoliciesarray

Lifecycle management schema when any task in a task group is failed.

Currently we only support one lifecycle policy. When the lifecycle policy condition is met, the action in the policy will execute. If task execution result does not meet with the defined lifecycle policy, we consider it as the default policy. Default policy means if the exit code is 0, exit task. If task ends with non-zero exit code, retry the task with max_retry_count.

Definitions

io.kestra.plugin.ee.gcp.runner.Batch-LifecyclePolicy

actionstring

Possible Values

ACTION_UNSPECIFIEDRETRY_TASKFAIL_TASKUNRECOGNIZED

Action on task failures based on different conditions.

actionCondition

Conditions for actions to deal with task failures.

io.kestra.plugin.ee.gcp.runner.Batch-LifecyclePolicyAction

exitCodesarray

SubTypeinteger

Exit codes of a task execution.

If there are more than 1 exit codes, when task executes with any of the exit code in the list, the condition is met and the action will be executed.

machineTypestring

Defaulte2-medium

The GCP machine type.

See https://cloud.google.com/compute/docs/machine-types

maxCreateJobRetryCountintegerstring

Default2

maxRetryCountinteger

Minimum>= 0

Maximum<= 10

Maximum number of retries on failures.

The default, 0, which means never retry.

networkInterfacesarray

Network interfaces.

Definitions

io.kestra.plugin.ee.gcp.runner.Batch-NetworkInterface

network*string

Network identifier with the format projects/HOST_PROJECT_ID/global/networks/NETWORK.

subnetworkstring

Subnetwork identifier in the format projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNET

projectIdstring

The GCP project ID.

reservationstring

Compute reservation.

resumebooleanstring

Defaulttrue

Whether to reconnect to the current job if it already exists.

scopesarray

SubTypestring

Default["https://www.googleapis.com/auth/cloud-platform"]

The GCP scopes to be used.

serviceAccountstring

The GCP service account key.

syncWorkingDirectorybooleanstring

Defaultfalse

Whether to synchronize working directory from remote runner back to local one after run.

versionstring

Plugin Version

Defines the version of the plugin to use.

The version must follow the Semantic Versioning (SemVer) specification:

A single-digit MAJOR version (e.g., 1).
A MAJOR.MINOR version (e.g., 1.1).
A MAJOR.MINOR.PATCH version, optionally with any qualifier (e.g., 1.1.2, 1.1.0-SNAPSHOT).