This plugin is currently in beta. While it is considered safe for use, please be aware that its API could change in ways that are not compatible with earlier versions in future releases, or it might become unsupported.
Create a Retrieval Augmented Generation (RAG) pipeline.
type: "io.kestra.plugin.ai.rag.ChatCompletion"
Examples
Chat with your data using Retrieval Augmented Generation (RAG). This flow will index documents and use the RAG Chat task to interact with your data using natural language prompts. The flow contrasts prompts to LLM with and without RAG. The Chat with RAG retrieves embeddings stored in the KV Store and provides a response grounded in data rather than hallucinating. WARNING: the KV embedding store is for quick prototyping only, as it stores the embedding vectors in Kestra's KV store an loads them all into memory.
id: rag
namespace: company.team
tasks:
- id: ingest
type: io.kestra.plugin.ai.rag.IngestDocument
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
drop: true
fromExternalURLs:
- https://raw.githubusercontent.com/kestra-io/docs/refs/heads/main/content/blogs/release-0-22.md
- id: chat_without_rag
type: io.kestra.plugin.ai.ChatCompletion
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-2.5-flash
apiKey: "{{ secret('GEMINI_API_KEY') }}"
messages:
- type: user
content: Which features were released in Kestra 0.22?
- id: chat_with_rag
type: io.kestra.plugin.ai.rag.ChatCompletion
chatProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-2.5-flash
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddingProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
prompt: Which features were released in Kestra 0.22?
Chat with your data using Retrieval Augmented Generation (RAG) and a WebSearch content retriever. The Chat with RAG retrieves contents from a WebSearch client and provides a response grounded in data rather than hallucinating.
id: rag
namespace: company.team
tasks:
- id: chat_with_rag_and_websearch_content_retriever
type: io.kestra.plugin.ai.rag.ChatCompletion
chatProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-2.5-flash
apiKey: "{{ secret('GEMINI_API_KEY') }}"
contentRetrievers:
- type: io.kestra.plugin.ai.retriever.GoogleCustomWebSearch
apiKey: "{{ secret('GOOGLE_SEARCH_API_KEY') }}"
csi: "{{ secret('GOOGLE_SEARCH_CSI') }}"
prompt: What is the latest release of Kestra?
Chat with your data using Retrieval Augmented Generation (RAG) and an additional WebSearch tool. This flow will index documents and use the RAG Chat task to interact with your data using natural language prompts. The flow contrasts prompts to LLM with and without RAG. The Chat with RAG retrieves embeddings stored in the KV Store and provides a response grounded in data rather than hallucinating. It may also include results from a web search engine if using the provided tool. WARNING: the KV embedding store is for quick prototyping only, as it stores the embedding vectors in Kestra's KV store an loads them all into memory.
id: rag
namespace: company.team
tasks:
- id: ingest
type: io.kestra.plugin.ai.rag.IngestDocument
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
drop: true
fromExternalURLs:
- https://raw.githubusercontent.com/kestra-io/docs/refs/heads/main/content/blogs/release-0-22.md
- id: chat_with_rag_and_tool
type: io.kestra.plugin.ai.rag.ChatCompletion
chatProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-2.5-flash
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddingProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
tools:
- type: io.kestra.plugin.ai.tool.GoogleCustomWebSearch
apiKey: "{{ secret('GOOGLE_SEARCH_API_KEY') }}"
csi: "{{ secret('GOOGLE_SEARCH_CSI') }}"
prompt: What is the latest release of Kestra?
Store chat memory inside a K/V pair.
id: chat-with-memory
namespace: company.team
inputs:
- id: first
type: STRING
defaults: Hello, my name is John
- id: second
type: STRING
defaults: What's my name?
tasks:
- id: first
type: io.kestra.plugin.ai.rag.ChatCompletion
chatProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-2.5-flash
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddingProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
memory:
type: io.kestra.plugin.ai.memory.KestraKVMemory
systemMessage: You are an helpful assistant, answer concisely
prompt: "{{inputs.first}}"
- id: second
type: io.kestra.plugin.ai.rag.ChatCompletion
chatProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-2.5-flash
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddingProvider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ secret('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
memory:
type: io.kestra.plugin.ai.memory.KestraKVMemory
drop: true
systemMessage: You are an helpful assistant, answer concisely
prompt: "{{inputs.second}}"
Properties
chatProvider *RequiredNon-dynamicAmazonBedrockAnthropicAzureOpenAIDeepSeekGoogleGeminiGoogleVertexAIMistralAIOllamaOpenAI
Chat Model Provider
contentRetrieverConfiguration *RequiredNon-dynamicChatCompletion-ContentRetrieverConfiguration
{
"maxResults": 3,
"minScore": 0
}
Content Retriever Configuration
systemMessage *Requiredstring
System message
The system message for the language model
chatConfiguration Non-dynamicChatConfiguration
{}
Chat configuration
contentRetrievers array
Additional content retrievers
Some content retrievers like WebSearch can be used also as tools, but using them as content retrievers will make them always used whereas tools are only used when the LLM decided to.
embeddingProvider Non-dynamicAmazonBedrockAnthropicAzureOpenAIDeepSeekGoogleGeminiGoogleVertexAIMistralAIOllamaOpenAI
Embedding Store Model Provider
Optional, if not set, the embedding model will be created by the chatModelProvider
. In this case, be sure that the chatModelProvider
supports embeddings.
embeddings Non-dynamicChromaElasticsearchKestraKVStoreMilvusMongoDBAtlasPGVectorPineconeQdrantWeaviate
Embedding Store Provider
Optional if at least one contentRetrievers
is provided
memory Non-dynamicKestraKVMemory
Agent Memory
Agent memory will store messages and add them as history inside the LLM context.
prompt string
Text prompt
The input prompt for the language model
tools array
Tools that the LLM may use to augment its response
Outputs
completion string
Generated text completion
The result of the text completion
finishReason string
STOP
LENGTH
TOOL_EXECUTION
CONTENT_FILTER
OTHER
Finish reason
tokenUsage TokenUsage
Token usage
Definitions
PGVector Embedding Store
database *Requiredstring
The database name
host *Requiredstring
The database server host
password *Requiredstring
The database password
port *Requiredintegerstring
The database server port
table *Requiredstring
The table to store embeddings in
type *Requiredobject
user *Requiredstring
The database user
useIndex booleanstring
false
Whether to use use an IVFFlat index
An IVFFlat index divides vectors into lists, and then searches a subset of those lists closest to the query vector. It has faster build times and uses less memory than HNSW but has lower query performance (in terms of speed-recall tradeoff).
MongoDB Atlas Embedding Store
collectionName *Requiredstring
The collection name
host *Requiredstring
The host
indexName *Requiredstring
The index name
scheme *Requiredstring
The scheme (e.g. mongodb+srv)
type *Requiredobject
createIndex booleanstring
Create the index
database string
The database
metadataFieldNames array
The metadata field names
options object
The connection string options
password string
The password
username string
The username
Mistral AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
API base URL
Model Context Protocol (MCP) HTTP client tool
command *Requiredarray
The MCP client command, as a list of command parts.
type *Requiredobject
environment object
Environment variables
Chroma Embedding Store
baseUrl *Requiredstring
The database base URL
collectionName *Requiredstring
The collection name
type *Requiredobject
io.kestra.plugin.ai.embeddings.Elasticsearch-ElasticsearchConnection-BasicAuth
password string
Basic auth password.
username string
Basic auth username.
Deepseek Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://api.deepseek.com/v1
API base URL
Pinecone Embedding Store
apiKey *Requiredstring
The API key
cloud *Requiredstring
The cloud provider
index *Requiredstring
The index
region *Requiredstring
The cloud provider region
type *Requiredobject
namespace string
The namespace (default will be used if not provided)
WebSearch tool for Google Custom Search
apiKey *Requiredstring
API Key
csi *Requiredstring
API Key
type *Requiredobject
Ollama Model Provider
endpoint *Requiredstring
Model endpoint
modelName *Requiredstring
Model name
type *Requiredobject
OpenAI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
API base URL
WebSearch content retriever for Google Custom Search
apiKey *Requiredstring
API Key
csi *Requiredstring
API Key
type *Requiredobject
maxResults integerstring
3
Maximum number of results to return
io.kestra.plugin.ai.domain.ChatConfiguration
seed integerstring
seed
temperature numberstring
Temperature
topK integerstring
topK
topP numberstring
topP
io.kestra.plugin.ai.domain.TokenUsage
inputTokenCount integer
outputTokenCount integer
totalTokenCount integer
Elasticsearch Embedding Store
connection *RequiredElasticsearch-ElasticsearchConnection
indexName *Requiredstring
The name of the index to store embeddings
type *Requiredobject
io.kestra.plugin.ai.rag.ChatCompletion-ContentRetrieverConfiguration
maxResults integer
3
The maximum number of results from the embedding store.
minScore number
0
The minimum score, ranging from 0 to 1 (inclusive). Only embeddings with a score >= minScore will be returned.
Azure OpenAI Model Provider
endpoint *Requiredstring
API endpoint
The Azure OpenAI endpoint in the format: https://{resource}.openai.azure.com/
modelName *Requiredstring
Model name
type *Requiredobject
apiKey string
API Key
clientId string
Client ID
clientSecret string
Client secret
serviceVersion string
API version
tenantId string
Tenant ID
Qdrant Embedding Store
apiKey *Requiredstring
The API key
collectionName *Requiredstring
The collection name
host *Requiredstring
The database server host
port *Requiredintegerstring
The database server port
type *Requiredobject
Google VertexAI Model Provider
endpoint *Requiredstring
Endpoint URL
location *Requiredstring
Project location
modelName *Requiredstring
Model name
project *Requiredstring
Project ID
type *Requiredobject
Google Gemini Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
In-memory Embedding Store that then store its serialization form as a Kestra K/V pair
type *Requiredobject
kvName string
{{flow.id}}-embedding-store
The name of the K/V entry to use
In-memory Chat Memory that then store its serialization form as a Kestra K/V pair
type *Requiredobject
drop booleanstring
false
Drop the memory at the end of the task.
By default, the memory ID is value of the 'system.correlationId' label, this means that the same memory will be used by all tasks of the flow and its subflow.
If you want to remove the memory eagerly (before expiration), you can set drop: true
inside the last task of the flow so the memory is erased after its execution.
memoryId string
{{ labels.system.correlationId }}
The memory id. Defaults to the value of the 'system.correlationId' label. This means that a memory is valid for the whole flow execution including its subflows.
messages integerstring
10
The maximum number of messages to keep inside the memory.
ttl string
PT1H
duration
The memory duration. Defaults to 1h.
WebSearch content retriever for Tavily Search
apiKey *Requiredstring
API Key
type *Requiredobject
maxResults integerstring
3
Maximum number of results to return
Milvus Embedding Store
token *Requiredstring
The token
type *Requiredobject
autoFlushOnDelete booleanstring
Whether to auto flush on delete
autoFlushOnInsert booleanstring
Whether to auto flush on insert
collectionName string
The collection name
If there is no such collection yet, it will be created automatically. Default value: "default".
consistencyLevel string
The consistency level
databaseName string
The database name
If not provided, the default database will be used.
host string
The host
Default value: "localhost"
idFieldName string
The id field name
indexType string
The index type
metadataFieldName string
The metadata field name
metricType string
The metric type
password string
The password
If user authentication and TLS is enabled, this parameter is required. See: https://milvus.io/docs/authenticate.md
port integerstring
The port
Default value: "19530"
retrieveEmbeddingsOnSearch booleanstring
Whether to retrieve embeddings on search
textFieldName string
The text field name
uri string
The uri
username string
The username
If user authentication and TLS is enabled, this parameter is required. See: https://milvus.io/docs/authenticate.md
vectorFieldName string
The vector field name
Anthropic AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
WebSearch tool for Tavily Search
apiKey *Requiredstring
API Key
type *Requiredobject
Weaviate Embedding Store
apiKey *Requiredstring
Weaviate API key
Your Weaviate API key. Not required for local deployment.
host *Requiredstring
Weaviate host
The host, e.g. "ai-4jw7ufd9.weaviate.network" of cluster URL. Find in under Details of your Weaviate cluster.
type *Requiredobject
avoidDups booleanstring
Weaviate avoid dups
If true (default), then WeaviateEmbeddingStore will generate a hashed ID based on provided text segment, which avoids duplicated entries in DB. If false, then random ID will be generated.
consistencyLevel string
ONE
QUORUM
ALL
Weaviate consistency level
Consistency level: ONE, QUORUM (default) or ALL.
grpcPort integerstring
gRPC port if used
metadataFieldName string
Weaviate metadata field name
The name of the metadata field to store. If not provided, will default to "_metadata".
metadataKeys array
Weaviate metadata keys
The list of metadata keys to store. If not provided, will default to an empty list.
objectClass string
Weaviate object class
The object class you want to store, e.g. "MyGreatClass". Must start from an uppercase letter. If not provided, will default to "Default".
port integerstring
Weaviate port
The port, e.g. 8080. This parameter is optional.
scheme string
Weaviate scheme
The scheme, e.g. "https" of cluster URL. Find in under Details of your Weaviate cluster.
securedGrpc booleanstring
The gRPC connection is secured
useGrpcForInserts booleanstring
Use gRPC for inserts
Use GRPC instead of HTTP for batch inserts only. You still need HTTP configured for search.
io.kestra.plugin.ai.embeddings.Elasticsearch-ElasticsearchConnection
hosts *Requiredarray
1
List of HTTP ElasticSearch servers.
Must be an URI like https://elasticsearch.com: 9200
with scheme and port.
basicAuth Elasticsearch-ElasticsearchConnection-BasicAuth
Basic auth configuration.
headers array
List of HTTP headers to be send on every request.
Must be a string with key value separated with :
, ex: Authorization: Token XYZ
.
pathPrefix string
Sets the path's prefix for every request used by the HTTP client.
For example, if this is set to /my/path
, then any client request will become /my/path/
+ endpoint.
In essence, every request's endpoint is prefixed by this pathPrefix
.
The path prefix is useful for when ElasticSearch is behind a proxy that provides a base path or a proxy that requires all paths to start with '/'; it is not intended for other purposes and it should not be supplied in other scenarios.
strictDeprecationMode booleanstring
Whether the REST client should return any response containing at least one warning header as a failure.
trustAllSsl booleanstring
Trust all SSL CA certificates.
Use this if the server is using a self signed SSL certificate.
Model Context Protocol (MCP) HTTP client tool
sseUrl *Requiredstring
SSE URL to the MCP server
type *Requiredobject
timeout string
duration
Connection timeout
Amazon Bedrock Model Provider
accessKeyId *Requiredstring
AWS Access Key ID
modelName *Requiredstring
Model name
secretAccessKey *Requiredstring
AWS Secret Access Key
type *Requiredobject
modelType string
COHERE
COHERE
TITAN
Amazon Bedrock Embedding Model Type