Run a Python script on Google Cloud with Google Batch

Source

yaml

id: gcp-batch-runner
namespace: company.team

tasks:
  - id: scrape_environment_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.ee.gcp.runner.Batch
      projectId: "{{ secret('GCP_PROJECT_ID') }}"
      region: europe-west9
      bucket: "{{ secret('GCS_BUCKET')}}"
      serviceAccount: "{{ secret('GOOGLE_SA') }}"
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - environment_info.json
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json
        from kestra import Kestra

        print("Hello from GCP Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = '{{ workingDir }}/environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        if __name__ == '__main__':
          print_environment_info()

About this blueprint

Python GCP Task Runner

This flow will execute a Python script on Google Batch. This requires you to setup Google Cloud Batch, GCS in the same region that you want to run Google Batch Jobs and Google Service Account.

In order to support inputFiles, namespaceFiles, and outputFiles, the Google Batch task runner will do the following:

mount a volume from a GCS bucket
upload input files to the bucket before launching the container
download output files from the bucket after the container has finished running.

Since we don't know the working directory of the container in advance, you need to explicitly define the working directory and output directory when using the GCP Batch runner, e.g. use python {{ workingDir }}/main.py rather than python main.py.

Commands

Batch

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra