Batch
Task runner that executes a task inside a job in Google Cloud Batch.
: : alert{type="info"} This plugin is only available in the Enterprise Edition (EE). : :
This task runner is container-based so the containerImage
property must be set.
You need to have roles 'Batch Job Editor' and 'Logs Viewer' to be able to use it.
To access the task's working directory, use the {{workingDir}}
Pebble expression or the WORKING_DIR
environment variable. Input files and namespace files will be available in this directory.
To generate output files you can either use the outputFiles
task's property and create a file with the same name in the task's working directory, or create any file in the output directory which can be accessed by the {{outputDir}}
Pebble expression or the OUTPUT_DIR
environment variables.
To use inputFiles
, outputFiles
or namespaceFiles
properties, make sure to set the bucket
property. The bucket serves as an intermediary storage layer for the task runner. Input and namespace files will be uploaded to the cloud storage bucket before the task run. Similarly, the task runner will store outputFiles in this bucket during the task run. In the end, the task runner will make those files available for download and preview from the UI by sending them to internal storage.
The task runner will generate a folder in the configured bucket
for each task run. You can access that folder using the {{bucketPath}}
Pebble expression or the BUCKET_PATH
environment variable.
Warning, contrarily to other task runners, this task runner didn't run the task in the working directory but in the root directory. You must use the {{workingDir}}
Pebble expression or the WORKING_DIR
environment variable to access files.
Note that when the Kestra Worker running this task is terminated, the batch job will still runs until completion, then after restarting, the Worker will resume processing on the existing job unless resume
is set to false.
type: "io.kestra.plugin.ee.gcp.runner.Batch"
Execute a Shell command.
id: new-shell
namespace: company.team
tasks:
- id: shell
type: io.kestra.plugin.scripts.shell.Commands
taskRunner:
type: io.kestra.plugin.ee.gcp.runner.Batch
projectId: "{{vars.projectId}}"
region: "{{vars.region}}"
commands:
- echo "Hello World"
Pass input files to the task, execute a Shell command, then retrieve output files.
id: new-shell-with-file
namespace: company.team
inputs:
- id: file
type: FILE
tasks:
- id: shell
type: io.kestra.plugin.scripts.shell.Commands
inputFiles:
data.txt: "{{inputs.file}}"
outputFiles:
- out.txt
containerImage: centos
taskRunner:
type: io.kestra.plugin.ee.gcp.runner.Batch
projectId: "{{vars.projectId}}"
region: "{{vars.region}}"
bucket: "{{vars.bucker}}"
commands:
- cp {{workingDir}}/data.txt {{workingDir}}/out.txt
YES
e2-medium
The GCP machine type.
YES
true
YES
Google Cloud Storage Bucket to use to upload (inputFiles
and namespaceFiles
) and download (outputFiles
) files.
It's mandatory to provide a bucket if you want to use such properties.
YES
PT5S
duration
Determines how often Kestra should poll the container for completion. By default, the task runner checks every 5 seconds whether the job is completed. You can set this to a lower value (e.g. PT0.1S
= every 100 milliseconds) for quick jobs and to a lower threshold (e.g. PT1M
= every minute) for long-running jobs. Setting this property to a lower value will reduce the number of API calls Kestra makes to the remote service — keep that in mind in case you see API rate limit errors.
NO
Compute resource requirements.
ComputeResource defines the amount of resources required for each task. Make sure your tasks have enough compute resources to successfully run. If you also define the types of resources for a job to use with the InstancePolicyOrTemplate field, make sure both fields are compatible with each other.
YES
true
YES
Container entrypoint to use.
NO
Lifecycle management schema when any task in a task group is failed.
Currently we only support one lifecycle policy. When the lifecycle policy condition is met, the action in the policy will execute. If task execution result does not meet with the defined lifecycle policy, we consider it as the default policy. Default policy means if the exit code is 0, exit task. If task ends with non-zero exit code, retry the task with max_retry_count.
YES
2
NO
>= 0
<= 10
Maximum number of retries on failures.
The default, 0, which means never retry.
YES
The GCP project ID.
YES
The GCP region.
YES
Compute reservation.
YES
["https://www.googleapis.com/auth/cloud-platform"]
The GCP scopes to be used.
YES
The GCP service account key.
YES
PT5S
duration
Additional time after the job ends to wait for late logs.
YES
PT1H
duration
The maximum duration to wait for the job completion unless the task timeout
property is set which will take precedence over this property.
Google Cloud Batch will automatically timeout the job upon reaching such duration and the task will be failed.
YES
Exit codes of a task execution.
If there are more than 1 exit codes, when task executes with any of the exit code in the list, the condition is met and the action will be executed.
YES
ACTION_UNSPECIFIED
RETRY_TASK
FAIL_TASK
UNRECOGNIZED
Action on task failures based on different conditions.
NO
Conditions for actions to deal with task failures.
YES
Network identifier with the format projects/HOST_PROJECT_ID/global/networks/NETWORK
.
YES
Subnetwork identifier in the format projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNET
YES
Extra boot disk size for each task.
YES
The milliCPU count.
Defines the amount of CPU resources per task in milliCPU units. For example, 1000
corresponds to 1 vCPU per task. If undefined, the default value is 2000
.
If you also define the VM's machine type using the machineType
property in InstancePolicy field or inside the instanceTemplate
in the InstancePolicyOrTemplate field, make sure the CPU resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.
For example, if you specify the n2-standard-2
machine type, which has 2 vCPUs, you can set the cpu
to no more than 2000
. Alternatively, you can run two tasks on the same VM if you set the cpu
to 1000
or less.
YES
Memory in MiB.
Defines the amount of memory per task in MiB units. If undefined, the default value is 2048
. If you also define the VM's machine type using the machineType
in InstancePolicy field or inside the instanceTemplate
in the InstancePolicyOrTemplate field, make sure the memory resources for both fields are compatible with each other and with how many tasks you want to allow to run on the same VM at the same time.
For example, if you specify the n2-standard-2
machine type, which has 8 GiB of memory, you can set the memory
to no more than 8192
.