Google Batch Task Runner​Google ​Batch ​Task ​Runner

Available on: Enterprise EditionCloud>= 0.18.0

Run tasks as containers on Google Cloud VMs.

How to use the Google Batch task runner

The Google Batch task runner deploys a container for each task on a specified Google Cloud Batch VM.

To launch tasks on Google Cloud Batch, you should understand three main concepts:

  1. Machine type — A required property that defines the compute machine type where the task will be deployed. If no reservation is specified, a new compute instance will be created for each batch, which can add up to a minute of startup latency.
  2. Reservation — An optional property that lets you reserve virtual machines in advance to avoid the delay of provisioning new instances for every task.
  3. Network interfaces — Optional; if not specified, the runner will use the default network interface.

How the Google Batch task runner works

To support inputFiles, namespaceFiles, and outputFiles, the Google Batch task runner performs the following actions:

  • Mounts a volume from a GCS bucket.
  • Uploads input files to the bucket before launching the container.
  • Downloads output files from the bucket after the container finishes.

Because the container’s working directory is not known ahead of time, you must explicitly define the working and output directories. For example, use python {{ workingDir }}/main.py instead of python main.py.

Example flow

yaml
id: gcp_batch_runner
namespace: company.team

variables:
  region: europe-west9

tasks:
  - id: scrape_environment_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      projectId: "{{ secret('GCP_PROJECT_ID') }}"
      region: "{{ vars.region }}"
      bucket: "{{ secret('GCS_BUCKET') }}"
      serviceAccount: "{{ secret('GOOGLE_SA') }}"
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - "environment_info.json"
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json
        from kestra import Kestra

        print("Hello from GCP Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = '{{ workingDir }}/environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        if __name__ == '__main__':
          print_environment_info()

Full setup guide: running Google Batch from scratch

Before you begin

You’ll need the following prerequisites:

  1. A Google Cloud account.
  2. A Kestra instance (version 0.16.0 or later) with Google credentials stored as secrets or set as environment variables.

Google Cloud Console setup

Create a project

If you don’t already have one, create a new project in the Google Cloud Console.

project

Once created, ensure your new project is selected in the top navigation bar.

project_selection

Enable the Batch API

Navigate to the APIs & Services section and search for Batch API. Enable it so Kestra can create and manage Batch jobs.

batchapi

After enabling the API, you’ll be prompted to create credentials for integration.

Create a service account

Once the Batch API is active, create a service account to allow Kestra to access GCP resources.

Follow the prompt for Application data, which will generate a new service account.

api-credentials-1

Give the service account a descriptive name.

sa-1

Assign the following roles:

  • Batch Job Editor
  • Logs Viewer
  • Storage Object Admin

roles

Next, create a key for this service account by going to Keys → Add Key, and choose JSON. This will generate credentials you can add to Kestra as a secret or directly into your flow configuration.

create-key

See Google credentials guide for more details.

Grant this service account access to the Compute Engine default service account by navigating to IAM & Admin → Service Accounts → Permissions → Grant Access, then assigning the Service Account User role.

compute

Create a storage bucket

Search for “Bucket” in the Cloud Console and create a new GCS bucket. You can keep the default configuration for now.

bucket

Create a flow

Below is a sample flow that runs a Python file (main.py) using the Google Batch Task Runner. The taskRunner section defines properties such as the project, region, and bucket.

yaml
containerImage: ghcr.io/kestra-io/kestrapy:latest
taskRunner:
  type: io.kestra.plugin.ee.gcp.runner.Batch
  projectId: "{{ secret('GCP_PROJECT_ID') }}"
  region: "{{ vars.region }}"
  bucket: "{{ secret('GCS_BUCKET') }}"
  serviceAccount: "{{ secret('GOOGLE_SA') }}"

Here’s the full flow configuration:

yaml
id: gcp_batch_runner
namespace: company.team

variables:
  region: europe-west2

tasks:
  - id: scrape_environment_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/kestrapy:latest
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      projectId: "{{ secret('GCP_PROJECT_ID') }}"
      region: "{{ vars.region }}"
      bucket: "{{ secret('GCS_BUCKET') }}"
      serviceAccount: "{{ secret('GOOGLE_SA') }}"
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - "environment_info.json"
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json
        from kestra import Kestra

        print("Hello from GCP Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = '{{ workingDir }}/environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        print_environment_info()

When you execute the flow, the logs will show the task runner being created:

logs

You can also confirm job creation directly in the Google Cloud Console:

batch-jobs

After the task completes, the runner automatically shuts down. You can review output artifacts in Kestra’s Outputs tab:

outputs

Was this page helpful?