About this blueprint
GCP Python Task Runner
This flow will execute a Python script on Google Batch. This requires you to setup Google Cloud Batch, GCS in the same region that you want to run Google Batch Jobs and Google Service Account.
In order to support inputFiles, namespaceFiles, and outputFiles, the Google Batch task runner will do the following:
- mount a volume from a GCS bucket
- upload input files to the bucket before launching the container
- download output files from the bucket after the container has finished running.
Since we don't know the working directory of the container in advance, you need to explicitly define the working directory and output directory when using the GCP Batch runner, e.g. use python {{ workingDir }}/main.py
rather than python main.py.
yaml
id: gcp_batch_runner
namespace: company.team
tasks:
- id: scrape_environment_info
type: io.kestra.plugin.scripts.python.Commands
containerImage: ghcr.io/kestra-io/pydata:latest
taskRunner:
type: io.kestra.plugin.ee.gcp.runner.Batch
projectId: "{{ secret('GCP_PROJECT_ID') }}"
region: europe-west9
bucket: "{{ secret('GCS_BUCKET')}}"
serviceAccount: "{{ secret('GOOGLE_SA') }}"
commands:
- python {{ workingDir }}/main.py
namespaceFiles:
enabled: true
outputFiles:
- "environment_info.json"
inputFiles:
main.py: |
import platform
import socket
import sys
import json
from kestra import Kestra
print("Hello from GCP Batch and kestra!")
def print_environment_info():
print(f"Host's network name: {platform.node()}")
print(f"Python version: {platform.python_version()}")
print(f"Platform information (instance type): {platform.platform()}")
print(f"OS/Arch: {sys.platform}/{platform.machine()}")
env_info = {
"host": platform.node(),
"platform": platform.platform(),
"OS": sys.platform,
"python_version": platform.python_version(),
}
Kestra.outputs(env_info)
filename = '{{ workingDir }}/environment_info.json'
with open(filename, 'w') as json_file:
json.dump(env_info, json_file, indent=4)
if __name__ == '__main__':
print_environment_info()