IngestDocument
IngestDocument
yaml
type: "io.kestra.plugin.ai.rag.IngestDocument"Examples
yaml
id: document_ingestion
namespace: company.ai
tasks:
- id: ingest
type: io.kestra.plugin.ai.rag.IngestDocument
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ kv('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
drop: true
fromExternalURLs:
- https://raw.githubusercontent.com/kestra-io/docs/refs/heads/main/content/blogs/release-0-24.md
Properties
embeddings *RequiredNon-dynamic
Definitions
Store embeddings in Chroma
baseUrl*Requiredstring
collectionName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.Chromaio.kestra.plugin.langchain4j.embeddings.ChromaStore embeddings in Elasticsearch
connection*Required
io.kestra.plugin.ai.embeddings.Elasticsearch-ElasticsearchConnection
hosts*Requiredarray
SubTypestring
Min items
1basicAuth
io.kestra.plugin.ai.embeddings.Elasticsearch-ElasticsearchConnection-BasicAuth
passwordstring
usernamestring
headersarray
SubTypestring
pathPrefixstring
strictDeprecationModebooleanstring
trustAllSslbooleanstring
indexName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.Elasticsearchio.kestra.plugin.langchain4j.embeddings.ElasticsearchPrototype embeddings in Kestra KV
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.KestraKVStoreio.kestra.plugin.langchain4j.embeddings.KestraKVStorekvNamestring
Default
{{flow.id}}-embedding-storeStore embeddings in MariaDB
createTable*Requiredbooleanstring
databaseUrl*Requiredstring
fieldName*Requiredstring
password*Requiredstring
tableName*Requiredstring
type*Requiredobject
username*Requiredstring
columnDefinitionsarray
SubTypestring
indexesarray
SubTypestring
metadataStorageModestring
Default
COLUMN_PER_KEYStore embeddings in Milvus
token*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.Milvusio.kestra.plugin.langchain4j.embeddings.MilvusautoFlushOnDeletebooleanstring
autoFlushOnInsertbooleanstring
collectionNamestring
consistencyLevelstring
databaseNamestring
hoststring
idFieldNamestring
indexTypestring
metadataFieldNamestring
metricTypestring
passwordstring
portintegerstring
retrieveEmbeddingsOnSearchbooleanstring
textFieldNamestring
uristring
usernamestring
vectorFieldNamestring
Store embeddings in MongoDB Atlas
collectionName*Requiredstring
host*Requiredstring
indexName*Requiredstring
scheme*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.MongoDBAtlasio.kestra.plugin.langchain4j.embeddings.MongoDBAtlascreateIndexbooleanstring
databasestring
metadataFieldNamesarray
SubTypestring
optionsobject
passwordstring
usernamestring
Store embeddings with pgvector
database*Requiredstring
host*Requiredstring
password*Requiredstring
port*Requiredintegerstring
table*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.PGVectorio.kestra.plugin.langchain4j.embeddings.PGVectoruser*Requiredstring
useIndexbooleanstring
Default
falseStore embeddings in Pinecone
apiKey*Requiredstring
cloud*Requiredstring
index*Requiredstring
region*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.Pineconeio.kestra.plugin.langchain4j.embeddings.Pineconenamespacestring
Store embeddings in Qdrant
apiKey*Requiredstring
collectionName*Requiredstring
host*Requiredstring
port*Requiredintegerstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.Qdrantio.kestra.plugin.langchain4j.embeddings.QdrantStore embeddings in Redis
host*Requiredstring
port*Requiredintegerstring
type*Requiredobject
indexNamestring
Default
embedding-indexStore embeddings in Alibaba Tablestore
accessKeyId*Requiredstring
accessKeySecret*Requiredstring
endpoint*Requiredstring
instanceName*Requiredstring
type*Requiredobject
metadataSchemaListarray
com.alicloud.openservices.tablestore.model.search.FieldSchema
analyzerstring
Possible Values
SingleWordMaxWordMinWordSplitFuzzyanalyzerParameter
com.alicloud.openservices.tablestore.model.search.analysis.AnalyzerParameter
dateFormatsarray
SubTypestring
enableHighlightingboolean
enableSortAndAggboolean
fieldNamestring
fieldTypestring
Possible Values
LONGDOUBLEBOOLEANKEYWORDTEXTNESTEDGEO_POINTDATEVECTORFUZZY_KEYWORDIPJSONUNKNOWNindexboolean
indexOptionsstring
Possible Values
DOCSFREQSPOSITIONSOFFSETSisArrayboolean
jsonTypestring
Possible Values
FLATTENNESTEDsourceFieldNamesarray
SubTypestring
storeboolean
subFieldSchemasarray
com.alicloud.openservices.tablestore.model.search.FieldSchema
analyzerstring
Possible Values
SingleWordMaxWordMinWordSplitFuzzyanalyzerParameter
dateFormatsarray
SubTypestring
enableHighlightingboolean
enableSortAndAggboolean
fieldNamestring
fieldTypestring
Possible Values
LONGDOUBLEBOOLEANKEYWORDTEXTNESTEDGEO_POINTDATEVECTORFUZZY_KEYWORDIPJSONUNKNOWNindexboolean
indexOptionsstring
Possible Values
DOCSFREQSPOSITIONSOFFSETSisArrayboolean
jsonTypestring
Possible Values
FLATTENNESTEDsourceFieldNamesarray
SubTypestring
storeboolean
subFieldSchemasarray
vectorOptions
vectorOptions
com.alicloud.openservices.tablestore.model.search.vector.VectorOptions
dataTypestring
dimensioninteger
metricTypestring
Possible Values
EUCLIDEANCOSINEDOT_PRODUCTStore embeddings in Weaviate
apiKey*Requiredstring
host*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.embeddings.Weaviateio.kestra.plugin.langchain4j.embeddings.WeaviateavoidDupsbooleanstring
consistencyLevelstring
Possible Values
ONEQUORUMALLgrpcPortintegerstring
metadataFieldNamestring
metadataKeysarray
SubTypestring
objectClassstring
portintegerstring
schemestring
securedGrpcbooleanstring
useGrpcForInsertsbooleanstring
provider *RequiredNon-dynamic
Definitions
Use Amazon Bedrock models
accessKeyId*Requiredstring
modelName*Requiredstring
secretAccessKey*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.AmazonBedrockio.kestra.plugin.langchain4j.provider.AmazonBedrockbaseUrlstring
caPemstring
clientPemstring
modelTypestring
Default
COHEREPossible Values
COHERETITANUse Anthropic Claude models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.Anthropicio.kestra.plugin.langchain4j.provider.AnthropicbaseUrlstring
caPemstring
clientPemstring
maxTokensintegerstring
Use Azure OpenAI deployments
endpoint*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.AzureOpenAIio.kestra.plugin.langchain4j.provider.AzureOpenAIapiKeystring
baseUrlstring
caPemstring
clientIdstring
clientPemstring
clientSecretstring
serviceVersionstring
tenantIdstring
Use DashScope (Qwen) models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://dashscope-intl.aliyuncs.com/api/v1caPemstring
clientPemstring
enableSearchbooleanstring
maxTokensintegerstring
repetitionPenaltynumberstring
Use DeepSeek models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.DeepSeekio.kestra.plugin.langchain4j.provider.DeepSeekbaseUrlstring
Default
https://api.deepseek.com/v1caPemstring
clientPemstring
Use GitHub Models via Azure AI Inference
gitHubToken*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
Use Google Gemini models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.GoogleGeminiio.kestra.plugin.langchain4j.provider.GoogleGeminibaseUrlstring
caPemstring
clientPemstring
embeddingModelConfiguration
io.kestra.plugin.ai.provider.GoogleGemini-EmbeddingModelConfiguration
maxRetriesintegerstring
outputDimensionalityintegerstring
taskTypestring
Possible Values
RETRIEVAL_QUERYRETRIEVAL_DOCUMENTSEMANTIC_SIMILARITYCLASSIFICATIONCLUSTERINGQUESTION_ANSWERINGFACT_VERIFICATIONtimeoutstring
titleMetadataKeystring
Use Google Vertex AI models
endpoint*Requiredstring
location*Requiredstring
modelName*Requiredstring
project*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.GoogleVertexAIio.kestra.plugin.langchain4j.provider.GoogleVertexAIbaseUrlstring
caPemstring
clientPemstring
Use Hugging Face Inference endpoints
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://router.huggingface.co/v1caPemstring
clientPemstring
Use LocalAI OpenAI-compatible server
baseUrl*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.LocalAIio.kestra.plugin.langchain4j.provider.LocalAIcaPemstring
clientPemstring
Use Mistral models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.MistralAIio.kestra.plugin.langchain4j.provider.MistralAIbaseUrlstring
caPemstring
clientPemstring
Use OCI Generative AI models
compartmentId*Requiredstring
modelName*Requiredstring
region*Requiredstring
type*Requiredobject
authProviderstring
baseUrlstring
caPemstring
clientPemstring
Use local Ollama models
endpoint*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.Ollamaio.kestra.plugin.langchain4j.provider.OllamabaseUrlstring
caPemstring
clientPemstring
Use OpenAI models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.OpenAIio.kestra.plugin.langchain4j.provider.OpenAIbaseUrlstring
Default
https://api.openai.com/v1caPemstring
clientPemstring
Use OpenRouter models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.OpenRouterio.kestra.plugin.langchain4j.provider.OpenRouterbaseUrlstring
caPemstring
clientPemstring
Use IBM watsonx.ai models
apiKey*Requiredstring
modelName*Requiredstring
projectId*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
Use Cloudflare Workers AI models
accountId*Requiredstring
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
Possible Values
io.kestra.plugin.ai.provider.WorkersAIio.kestra.plugin.langchain4j.provider.WorkersAIbaseUrlstring
caPemstring
clientPemstring
Use ZhiPu AI models
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://open.bigmodel.cn/caPemstring
clientPemstring
maxRetriesintegerstring
maxTokenintegerstring
stopsarray
SubTypestring
bulkSize integerstring
Default
500documentSplitter Non-dynamic
Definitions
io.kestra.plugin.ai.rag.IngestDocument-DocumentSplitter
maxOverlapSizeInChars*Requiredinteger
maxSegmentSizeInChars*Requiredinteger
splitterstring
Default
RECURSIVEPossible Values
RECURSIVEPARAGRAPHLINESENTENCEWORDdrop booleanstring
Default
falsefromDocuments Non-dynamicarray
Definitions
io.kestra.plugin.ai.rag.IngestDocument-InlineDocument
content*Requiredstring
metadataobject
fromExternalURLs array
SubTypestring
fromInternalURIs array
SubTypestring
fromPath string
metadata object
SubTypestring
Outputs
embeddingStoreOutputs object
ingestedDocuments integer
inputTokenCount integer
outputTokenCount integer
totalTokenCount integer
Metrics
indexed.documents counter
Unit
recordsinput.token.count counter
Unit
tokenoutput.token.count counter
Unit
tokentotal.token.count counter
Unit
token