CustomJob
Start a custom job in Google Vertex AI.

Start a custom job in Google Vertex AI.

For more details, check out the custom job documentation.

yaml
type: "io.kestra.plugin.gcp.vertexai.CustomJob"

Examples

yaml
id: gcp_vertexai_custom_job
namespace: company.team

tasks:
  - id: custom_job
    type: io.kestra.plugin.gcp.vertexai.CustomJob
    projectId: my-gcp-project
    region: europe-west1
    displayName: Start Custom Job
    spec:
      workerPoolSpecs:
      - containerSpec:
          imageUri: gcr.io/my-gcp-project/my-dir/my-image:latest
        machineSpec:
          machineType: n1-standard-4
        replicaCount: 1

Properties

displayName*string

The job display name.

region*string

The GCP region.

spec*

The job specification.

Definitions

io.kestra.plugin.gcp.vertexai.models.CustomJobSpec

workerPoolSpecs*array

Min items1

The spec of the worker pools including machine type and Docker image.

All worker pools except the first one are optional and can be skipped

io.kestra.plugin.gcp.vertexai.models.WorkerPoolSpec

containerSpec*

The custom container task.

io.kestra.plugin.gcp.vertexai.models.ContainerSpec

imageUri*string

The URI of a container image in the Container Registry that is to be run on each worker replica.

Must be on google container registry, example: gcr.io/{{ project }}/{{ dir }}/{{ image }}: {{ tag }}

argsarray

SubTypestring

The arguments to be passed when starting the container.

commandsarray

SubTypestring

The command to be invoked when the container is started.

It overrides the entrypoint instruction in Dockerfile when provided.

envobject

SubTypestring

Environment variables to be passed to the container.

machineSpec*

The specification of a single machine.

io.kestra.plugin.gcp.vertexai.models.MachineSpec

machineType*string

The type of the machine.

See the list of machine types supported forprediction See the list of machine types supported for custom training.

acceleratorCountintegerstring

The number of accelerators to attach to the machine.

acceleratorTypestring

Possible Values

ACCELERATOR_TYPE_UNSPECIFIEDNVIDIA_TESLA_K80NVIDIA_TESLA_P100NVIDIA_TESLA_V100NVIDIA_TESLA_P4NVIDIA_TESLA_T4NVIDIA_TESLA_A100NVIDIA_A100_80GBNVIDIA_L4NVIDIA_H100_80GBNVIDIA_H100_MEGA_80GBNVIDIA_H200_141GBNVIDIA_B200NVIDIA_GB200NVIDIA_RTX_PRO_6000TPU_V2TPU_V3TPU_V4_PODTPU_V5_LITEPODUNRECOGNIZED

The type of accelerator(s) that may be attached to the machine.

discSpec

io.kestra.plugin.gcp.vertexai.models.DiscSpec

bootDiskSizeGbintegerstring

Default100

Size in GB of the boot disk.

bootDiskTypestring

DefaultPD_SSD

Possible Values

PD_SSDPD_STANDARD

Type of the boot disk.

pythonPackageSpec

The python package specs.

io.kestra.plugin.gcp.vertexai.models.PythonPackageSpec

args*array

SubTypestring

envs*object

SubTypestring

Environment variables to be passed to the python module.

Maximum limit is 100.

packageUris*array

SubTypestring

The Google Cloud Storage location of the Python package files which are the training program and its dependent packages.

The maximum number of package URIs is 100.

replicaCountintegerstring

The specification of the disk.

baseOutputDirectory

The Cloud Storage location to store the output of this job.

io.kestra.plugin.gcp.vertexai.models.GcsDestination

outputUriPrefix*string

Google Cloud Storage URI to output directory.

If the uri doesn't end with '/', a '/' will be automatically appended. The directory is created if it doesn't exist.

enableWebAccessbooleanstring

Whether you want Vertex AI to enable interactive shell access to training containers.

networkstring

The full name of the Compute Engine network to which the Job should be peered.

For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name. To specify this field, you must have already configured VPC Network Peering for Vertex AI. If this field is left unspecified, the job is not peered with any network.

scheduling

Scheduling options for a CustomJob.

io.kestra.plugin.gcp.vertexai.models.Scheduling

restartJobOnWorkerRestart*booleanstring

Restarts the entire CustomJob if a worker gets restarted.

This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.

timeOut*string

Formatduration

The maximum job running time. The default is 7 days.

serviceAccountstring

Specifies the service account for workload run-as account.

text

   Users submitting jobs must have act-as permission on this run-as account.
   If unspecified, the [Vertex AI Custom Code Service
   Agent](https://cloud.google.com/vertex-ai/docs/general/access-control#service-agents)
   for the CustomJob's project is used.

tensorboardstring

The name of a Vertex AI Tensorboard resource to which this CustomJob

will upload Tensorboard logs. Format: projects/{project}/locations/{location}/tensorboards/{tensorboard}

deletebooleanstring

Defaulttrue

Delete the job at the end.

impersonatedServiceAccountstring

The GCP service account to impersonate.

projectIdstring

The GCP project ID.

scopesarray

SubTypestring

Default["https://www.googleapis.com/auth/cloud-platform"]

The GCP scopes to be used.

serviceAccountstring

The GCP service account.

waitbooleanstring

Defaulttrue

Wait for the end of the job.

Allowing to capture job status & logs.

Outputs

createDate*string

Formatdate-time

Time when the CustomJob was created.

endDate*string

Formatdate-time

Time when the CustomJob was ended.

name*string

Resource name of a CustomJob.

state*string

Possible Values

JOB_STATE_UNSPECIFIEDJOB_STATE_QUEUEDJOB_STATE_PENDINGJOB_STATE_RUNNINGJOB_STATE_SUCCEEDEDJOB_STATE_FAILEDJOB_STATE_CANCELLINGJOB_STATE_CANCELLEDJOB_STATE_PAUSEDJOB_STATE_EXPIREDJOB_STATE_UPDATINGJOB_STATE_PARTIALLY_SUCCEEDEDUNRECOGNIZED

The detailed state of the CustomJob.