ChatCompletion​Chat​Completion

Create a Retrieval Augmented Generation (RAG) pipeline.

yaml
type: "io.kestra.plugin.ai.rag.ChatCompletion"

Chat with your data using Retrieval Augmented Generation (RAG). This flow will index documents and use the RAG Chat task to interact with your data using natural language prompts. The flow contrasts prompts to LLM with and without RAG. The Chat with RAG retrieves embeddings stored in the KV Store and provides a response grounded in data rather than hallucinating. WARNING: the KV embedding store is for quick prototyping only, as it stores the embedding vectors in Kestra's KV store an loads them all into memory.

yaml
id: rag
namespace: company.team

tasks:
  - id: ingest
    type: io.kestra.plugin.ai.rag.IngestDocument
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-embedding-exp-03-07
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddings:
      type: io.kestra.plugin.ai.embeddings.KestraKVStore
    drop: true
    fromExternalURLs:
      - https://raw.githubusercontent.com/kestra-io/docs/refs/heads/main/content/blogs/release-0-22.md

  - id: chat_without_rag
    type: io.kestra.plugin.ai.ChatCompletion
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    messages:
    - type: user
      content: Which features were released in Kestra 0.22?

  - id: chat_with_rag
    type: io.kestra.plugin.ai.rag.ChatCompletion
    chatProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddingProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-embedding-exp-03-07
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddings:
      type: io.kestra.plugin.ai.embeddings.KestraKVStore
    prompt: Which features were released in Kestra 0.22?

Chat with your data using Retrieval Augmented Generation (RAG) and a WebSearch content retriever. The Chat with RAG retrieves contents from a WebSearch client and provides a response grounded in data rather than hallucinating.

yaml
id: rag
namespace: company.team

tasks:
  - id: chat_with_rag_and_websearch_content_retriever
    type: io.kestra.plugin.ai.rag.ChatCompletion
    chatProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    contentRetrievers:
      - type: io.kestra.plugin.ai.retriever.GoogleCustomWebSearch
        apiKey: "{{ secret('GOOGLE_SEARCH_API_KEY') }}"
        csi: "{{ secret('GOOGLE_SEARCH_CSI') }}"
    prompt: What is the latest release of Kestra?

Chat with your data using Retrieval Augmented Generation (RAG) and an additional WebSearch tool. This flow will index documents and use the RAG Chat task to interact with your data using natural language prompts. The flow contrasts prompts to LLM with and without RAG. The Chat with RAG retrieves embeddings stored in the KV Store and provides a response grounded in data rather than hallucinating. It may also include results from a web search engine if using the provided tool. WARNING: the KV embedding store is for quick prototyping only, as it stores the embedding vectors in Kestra's KV store an loads them all into memory.

yaml
id: rag
namespace: company.team

tasks:
  - id: ingest
    type: io.kestra.plugin.ai.rag.IngestDocument
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-embedding-exp-03-07
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddings:
      type: io.kestra.plugin.ai.embeddings.KestraKVStore
    drop: true
    fromExternalURLs:
      - https://raw.githubusercontent.com/kestra-io/docs/refs/heads/main/content/blogs/release-0-22.md

  - id: chat_with_rag_and_tool
    type: io.kestra.plugin.ai.rag.ChatCompletion
    chatProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddingProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-embedding-exp-03-07
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddings:
      type: io.kestra.plugin.ai.embeddings.KestraKVStore
    tools:
    - type: io.kestra.plugin.ai.tool.GoogleCustomWebSearch
      apiKey: "{{ secret('GOOGLE_SEARCH_API_KEY') }}"
      csi: "{{ secret('GOOGLE_SEARCH_CSI') }}"
    prompt: What is the latest release of Kestra?

Store chat memory inside a K/V pair.

yaml
id: chat-with-memory
namespace: company.team

inputs:
  - id: first
    type: STRING
    defaults: Hello, my name is John
  - id: second
    type: STRING
    defaults: What's my name?

tasks:
  - id: first
    type: io.kestra.plugin.ai.rag.ChatCompletion
    chatProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddingProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-embedding-exp-03-07
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddings:
      type: io.kestra.plugin.ai.embeddings.KestraKVStore
    memory:
      type: io.kestra.plugin.ai.memory.KestraKVMemory
    systemMessage: You are an helpful assistant, answer concisely
    prompt: "{{inputs.first}}"
  - id: second
    type: io.kestra.plugin.ai.rag.ChatCompletion
    chatProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-2.5-flash
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddingProvider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      modelName: gemini-embedding-exp-03-07
      apiKey: "{{ secret('GEMINI_API_KEY') }}"
    embeddings:
      type: io.kestra.plugin.ai.embeddings.KestraKVStore
    memory:
      type: io.kestra.plugin.ai.memory.KestraKVMemory
      drop: true
    systemMessage: You are an helpful assistant, answer concisely
    prompt: "{{inputs.second}}"
Properties

Chat Model Provider

Default { "maxResults": 3, "minScore": 0 }

Content Retriever Configuration

System message

The system message for the language model

Default {}

Chat configuration

Additional content retrievers

Some content retrievers like WebSearch can be used also as tools, but using them as content retrievers will make them always used whereas tools are only used when the LLM decided to.

Embedding Store Model Provider

Optional, if not set, the embedding model will be created by the chatModelProvider. In this case, be sure that the chatModelProvider supports embeddings.

Embedding Store Provider

Optional if at least one contentRetrievers is provided

Agent Memory

Agent memory will store messages and add them as history inside the LLM context.

Text prompt

The input prompt for the language model

Tools that the LLM may use to augment its response

Generated text completion

The result of the text completion

Possible Values
STOPLENGTHTOOL_EXECUTIONCONTENT_FILTEROTHER

Finish reason

Token usage

The database name

The database server host

The database password

The database server port

The table to store embeddings in

The database user

Default false

Whether to use use an IVFFlat index

An IVFFlat index divides vectors into lists, and then searches a subset of those lists closest to the query vector. It has faster build times and uses less memory than HNSW but has lower query performance (in terms of speed-recall tradeoff).

The collection name

The host

The index name

The scheme (e.g. mongodb+srv)

Create the index

The database

SubType string

The metadata field names

The connection string options

The password

The username

API Key

Model name

API base URL

SubType string

The MCP client command, as a list of command parts.

SubType string

Environment variables

The database base URL

The collection name

Basic auth password.

Basic auth username.

API Key

Model name

Default https://api.deepseek.com/v1

API base URL

The API key

The cloud provider

The index

The cloud provider region

The namespace (default will be used if not provided)

API Key

API Key

Model endpoint

Model name

API Key

Model name

API base URL

API Key

API Key

Default 3

Maximum number of results to return

seed

Temperature

topK

topP

The name of the index to store embeddings

Default 3

The maximum number of results from the embedding store.

Default 0

The minimum score, ranging from 0 to 1 (inclusive). Only embeddings with a score >= minScore will be returned.

API endpoint

The Azure OpenAI endpoint in the format: https://{resource}.openai.azure.com/

Model name

API Key

Client ID

Client secret

API version

Tenant ID

The API key

The collection name

The database server host

The database server port

Endpoint URL

Project location

Model name

Project ID

API Key

Model name

Default {{flow.id}}-embedding-store

The name of the K/V entry to use

Default false

Drop the memory at the end of the task.

By default, the memory ID is value of the 'system.correlationId' label, this means that the same memory will be used by all tasks of the flow and its subflow. If you want to remove the memory eagerly (before expiration), you can set drop: true inside the last task of the flow so the memory is erased after its execution.

Default {{ labels.system.correlationId }}

The memory id. Defaults to the value of the 'system.correlationId' label. This means that a memory is valid for the whole flow execution including its subflows.

Default 10

The maximum number of messages to keep inside the memory.

Default PT1H
Format duration

The memory duration. Defaults to 1h.

API Key

Default 3

Maximum number of results to return

The token

Whether to auto flush on delete

Whether to auto flush on insert

The collection name

If there is no such collection yet, it will be created automatically. Default value: "default".

The consistency level

The database name

If not provided, the default database will be used.

The host

Default value: "localhost"

The id field name

The index type

The metadata field name

The metric type

The password

If user authentication and TLS is enabled, this parameter is required. See: https://milvus.io/docs/authenticate.md

The port

Default value: "19530"

Whether to retrieve embeddings on search

The text field name

The uri

The username

If user authentication and TLS is enabled, this parameter is required. See: https://milvus.io/docs/authenticate.md

The vector field name

API Key

Model name

API Key

Weaviate API key

Your Weaviate API key. Not required for local deployment.

Weaviate host

The host, e.g. "ai-4jw7ufd9.weaviate.network" of cluster URL. Find in under Details of your Weaviate cluster.

Weaviate avoid dups

If true (default), then WeaviateEmbeddingStore will generate a hashed ID based on provided text segment, which avoids duplicated entries in DB. If false, then random ID will be generated.

Possible Values
ONEQUORUMALL

Weaviate consistency level

Consistency level: ONE, QUORUM (default) or ALL.

gRPC port if used

Weaviate metadata field name

The name of the metadata field to store. If not provided, will default to "_metadata".

SubType string

Weaviate metadata keys

The list of metadata keys to store. If not provided, will default to an empty list.

Weaviate object class

The object class you want to store, e.g. "MyGreatClass". Must start from an uppercase letter. If not provided, will default to "Default".

Weaviate port

The port, e.g. 8080. This parameter is optional.

Weaviate scheme

The scheme, e.g. "https" of cluster URL. Find in under Details of your Weaviate cluster.

The gRPC connection is secured

Use gRPC for inserts

Use GRPC instead of HTTP for batch inserts only. You still need HTTP configured for search.

SubType string
Min items 1

List of HTTP ElasticSearch servers.

Must be an URI like https://elasticsearch.com: 9200 with scheme and port.

Basic auth configuration.

SubType string

List of HTTP headers to be send on every request.

Must be a string with key value separated with : , ex: Authorization: Token XYZ.

Sets the path's prefix for every request used by the HTTP client.

For example, if this is set to /my/path, then any client request will become /my/path/ + endpoint. In essence, every request's endpoint is prefixed by this pathPrefix. The path prefix is useful for when ElasticSearch is behind a proxy that provides a base path or a proxy that requires all paths to start with '/'; it is not intended for other purposes and it should not be supplied in other scenarios.

Whether the REST client should return any response containing at least one warning header as a failure.

Trust all SSL CA certificates.

Use this if the server is using a self signed SSL certificate.

SSE URL to the MCP server

Format duration

Connection timeout

AWS Access Key ID

Model name

AWS Secret Access Key

Default COHERE
Possible Values
COHERETITAN

Amazon Bedrock Embedding Model Type