Troubleshooting​Troubleshooting

Something doesn't work as expected? Check out this page to help you troubleshoot.

CrashLoopBackoff when restarting all pods

Question from a user: "When I restart all Kubernetes pods at once, they get stuck in a CrashLoopBackoff for a number of minutes before eventually resolving - Why does it happen?"

You are likely hitting the startup issue of a Java application causing failure of liveness probes. The startup can consume a lot of resources, since Java will load many classes at startup. Setting the CPU limit to 0.5 can improve the startup time of the server and should resolve the failing healthchecks.

Unprocessable execution

Sometimes, things can go wrong in an unattended manner; in such situations, you can skip an execution that Kestra is not able to process.

To do so, you can start the executor server (or the standalone server if not using a deployment with separate components) with a list of execution identifiers to skip.

sh
kestra server executor --skip-executions 6FSPERUe1JwbYmMmdwRlgV,5iLGjTLOHAVGUGlsesFaMb

Locked triggers

Flow schedule and polling triggers have locks to avoid concurrent trigger evaluation and concurrent execution of a flow for a trigger.

To see a list of triggers and inspect their current status, go to the Administration -> Triggers section in the Kestra UI. From here, you can unlock a trigger if it is locked. Keep in mind that there is a risk or concurrent trigger evaluation or flow execution for this trigger if you unlock it manually.

Docker in Docker (DinD)

If you face some issues using Docker in Docker e.g. with Script tasks using DOCKER runner, start troubleshooting by attaching the terminal: docker run -it --privileged docker:dind sh. Then, use docker logs container_ID to get the container logs. Also, try docker inspect container_ID to get more information about your Docker container. The output from this command displays details about the container, its environments, network settings, etc. This information can help you identify what might be wrong.

Docker in Docker using Helm charts

On some Kubernetes deloyments, using DinD with our defautl Helm charts can lead to:

bash
Device "ip_tables" does not exist.
ip_tables              24576  4 iptable_raw,iptable_mangle,iptable_nat,iptable_filter
modprobe: can't change directory to '/lib/modules': No such file or directory
error: attempting to run rootless dockerd but need 'kernel.unprivileged_userns_clone' (/proc/sys/kernel/unprivileged_userns_clone) set to 1

To fix this, use root to launch the DinD container by setting the following values:

yaml
dind:
  image:
    tag: dind
  args:
    - --log-level=fatal
  securityContext:
    runAsUser: 0
    runAsGroup: 0

securityContext:
  runAsUser: 0
  runAsGroup: 0

Docker in Docker (DinD) on a Mac with ARM-based Apple silicon chip

If you are getting an error similar to: java.io.IOException: com.sun.jna.LastErrorException: [111] Connection refused, it might be related to a Docker in Docker (DinD) issue. Try using an embedded Docker server as shown below:

yaml
volumes:
  postgres-data:
    driver: local
  kestra-data:
    driver: local
  dind-socket:
    driver: local
  tmp-data:
    driver: local
services:
  postgres:
    image: postgres
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: kestra
      POSTGRES_USER: kestra
      POSTGRES_PASSWORD: k3str4
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
      interval: 30s
      timeout: 10s
      retries: 10

  dind:
    image: docker:dind
    privileged: true
    # user: "1000"
    environment:
      DOCKER_HOST: unix://dind/docker.sock
    command:
      - --log-level=fatal
      # - --group=1000
    volumes:
      - dind-socket:/dind
      - tmp-data:/tmp/kestra-wd

  kestra:
    image: kestra/kestra:latest-full
    pull_policy: always
    entrypoint: /bin/bash
    # Note that this is meant for development only. Refer to the documentation for production deployments of Kestra which runs without a root user.
    user: "root"
    command:
      - -c
      - /app/kestra server standalone --worker-thread=128
    volumes:
      - kestra-data:/app/storage
      - dind-socket:/dind
      - tmp-data:/tmp/kestra-wd
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://postgres:5432/kestra
            driverClassName: org.postgresql.Driver
            username: kestra
            password: k3str4
        kestra:
          server:
            basic-auth:
              enabled: false
              username: admin
              password: kestra
          repository:
            type: postgres
          storage:
            type: local
            local:
              base-path: "/app/storage"
          queue:
            type: postgres
          tasks:
            tmp-dir:
              path: /tmp/kestra-wd/tmp
          url: http://localhost:8080/
    ports:
      - "8080:8080"
      - "8081:8081"
    depends_on:
      postgres:
        condition: service_started
      dind:
        condition: service_started

Networking in Docker Compose

The default docker-compose file doesn't configure networking for the Kestra containers. This means that you won't be able to access any services exposed via localhost on your local machine (e.g., another Docker container with a mapped port). Your machine and Docker container use a different network. To use a locally exposed service from Kestra container, you can use the host.docker.internal hostname or 172.17.0.1. The host.docker.internal address allows you to reach your host machine's services from Kestra's container.

Alternatively, you can leverage Docker network. By default, your Kestra container will be placed in a default network. You can add your custom services to the docker-compose.yml file provided by Kestra and use the services' alias (keys from services) to reach them. Even better would be if you create a new network e.g. network kestra_net and add your services to it. Then you can add this network to the networks section of the kestra service. With this, you will have access via localhost to all your exposed ports. The example below shows how you can add iceberg-rest, minio and mc (i.e. Minio client) to your Kestra Docker Compose file.

yaml
version: "3"

volumes:
  postgres-data:
    driver: local
  kestra-data:
    driver: local

networks:
  kestra_net:

services:
  postgres:
    image: postgres
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: kestra
      POSTGRES_USER: kestra
      POSTGRES_PASSWORD: k3str4
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}"]
      interval: 30s
      timeout: 10s
      retries: 10
    networks:
      kestra_net:

  iceberg-rest:
      image: tabulario/iceberg-rest
      ports:
        - 8181:8181
      environment:
        - AWS_ACCESS_KEY_ID=admin
        - AWS_SECRET_ACCESS_KEY=password
        - AWS_REGION=us-east-1
        - CATALOG_WAREHOUSE=s3://warehouse/
        - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
        - CATALOG_S3_ENDPOINT=http://minio:9000
      networks:
        kestra_net:

  minio:
      image: minio/minio
      container_name: minio
      environment:
        - MINIO_ROOT_USER=admin
        - MINIO_ROOT_PASSWORD=password
        - MINIO_DOMAIN=minio
      networks:
        kestra_net:
          aliases:
            - warehouse.minio
      ports:
        - 9001:9001
        - 9000:9000
      command: ["server", "/data", "--console-address", ":9001"]

  mc:
      depends_on:
        - minio
      image: minio/mc
      container_name: mc
      networks:
        kestra_net:
      environment:
        - AWS_ACCESS_KEY_ID=admin
        - AWS_SECRET_ACCESS_KEY=password
        - AWS_REGION=us-east-1
      entrypoint: >
        /bin/sh -c "
        until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
        /usr/bin/mc rm -r --force minio/warehouse;
        /usr/bin/mc mb minio/warehouse;
        /usr/bin/mc policy set public minio/warehouse;
        tail -f /dev/null
        "

  kestra:
      image: kestra/kestra:latest-full
      pull_policy: always
      entrypoint: /bin/bash
      # Note that this is meant for development only. Refer to the documentation for production deployments of Kestra which runs without a root user.
      user: "root"
      command:
        - -c
        - /app/kestra server standalone --worker-thread=128
      volumes:
        - kestra-data:/app/storage
        - /var/run/docker.sock:/var/run/docker.sock
        - /tmp/kestra-wd:/tmp/kestra-wd
      environment:
        KESTRA_CONFIGURATION: |
          datasources:
            postgres:
              url: jdbc:postgresql://postgres:5432/kestra
              driverClassName: org.postgresql.Driver
              username: kestra
              password: k3str4
          kestra:
            server:
              basic-auth:
                enabled: false
                username: admin
                password: kestra
            repository:
              type: postgres
            storage:
              type: minio
              minio:
                endpoint: http://minio
                port: 9000
                accessKey: admin
                secretKey: password
                region: us-east-1
                bucket: minio/warehouse
            queue:
              type: postgres
            tasks:
              tmp-dir:
                path: /tmp/kestra-wd/tmp
            url: http://localhost:8080/
      ports:
        - "8080:8080"
        - "8081:8081"
      depends_on:
        postgres:
          condition: service_started
      networks:
        kestra_net:

Finally, you can also use the host network mode for the kestra service. This will make your container use your host network and you will be able to reach all your exposed ports. This means you have to change the services.kestra.environment.KESTRA_CONFIGURATION.datasources.postgres.url to jdbc:postgresql://localhost:5432/kestra. This is the easiest way but it can be a security risk. See the example below using network_mode: host.

yaml
version: "3"

volumes:
  kestra-data:
    driver: local

services:
  kestra:
    image: kestra/kestra:latest-full
    pull_policy: always
    entrypoint: /bin/bash
    network_mode: host
    environment:
      JAVA_OPTS: "--add-opens java.base/java.nio=ALL-UNNAMED"
      NODE_OPTIONS: "--max-old-space-size=4096"
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://localhost:5432/kestra
            driverClassName: org.postgresql.Driver
            username: kestra
            password: k3str4
        kestra:
          server:
            basic-auth:
              enabled: false
              username: admin
              password: kestra
          anonymous-usage-report:
            enabled: true
          repository:
            type: postgres
          storage:
            type: local
            local:
              base-path: "/app/storage"
          queue:
            type: postgres
          tasks:
            tmp-dir:
              path: /tmp/kestra-wd/tmp
            scripts:
              docker:
                volume-enabled: true
            defaults: # just one example to show global taskDefaults
              - type: io.kestra.plugin.airbyte.connections.Sync
                url: http://host.docker.internal:8000/
                username: airbyte
                password: password
          url: http://localhost:8080/
          variables:
            env-vars-prefix: "" # to avoid requiring KESTRA_ prefix on env vars