IngestDocument
IngestDocument
yaml
type: "io.kestra.plugin.ai.rag.IngestDocument"Examples
yaml
id: document_ingestion
namespace: company.ai
tasks:
- id: ingest
type: io.kestra.plugin.ai.rag.IngestDocument
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
modelName: gemini-embedding-exp-03-07
apiKey: "{{ kv('GEMINI_API_KEY') }}"
embeddings:
type: io.kestra.plugin.ai.embeddings.KestraKVStore
drop: true
fromExternalURLs:
- https://raw.githubusercontent.com/kestra-io/docs/refs/heads/main/content/blogs/release-0-24.md
Properties
embeddings *RequiredNon-dynamic
Definitions
Chroma Embedding Store
baseUrl*Requiredstring
collectionName*Requiredstring
type*Requiredobject
Elasticsearch Embedding Store
connection*Required
io.kestra.plugin.ai.embeddings.Elasticsearch-ElasticsearchConnection
hosts*Requiredarray
SubTypestring
Min items
1basicAuth
io.kestra.plugin.ai.embeddings.Elasticsearch-ElasticsearchConnection-BasicAuth
passwordstring
usernamestring
headersarray
SubTypestring
pathPrefixstring
strictDeprecationModebooleanstring
trustAllSslbooleanstring
indexName*Requiredstring
type*Requiredobject
In-memory embedding store that stores data as Kestra KV pairs
type*Requiredobject
kvNamestring
Default
{{flow.id}}-embedding-storeMariaDB Embedding Store
createTable*Requiredbooleanstring
databaseUrl*Requiredstring
fieldName*Requiredstring
password*Requiredstring
tableName*Requiredstring
type*Requiredobject
username*Requiredstring
columnDefinitionsarray
SubTypestring
indexesarray
SubTypestring
metadataStorageModestring
Default
COLUMN_PER_KEYMilvus Embedding Store
token*Requiredstring
type*Requiredobject
autoFlushOnDeletebooleanstring
autoFlushOnInsertbooleanstring
collectionNamestring
consistencyLevelstring
databaseNamestring
hoststring
idFieldNamestring
indexTypestring
metadataFieldNamestring
metricTypestring
passwordstring
portintegerstring
retrieveEmbeddingsOnSearchbooleanstring
textFieldNamestring
uristring
usernamestring
vectorFieldNamestring
MongoDB Atlas Embedding Store
collectionName*Requiredstring
host*Requiredstring
indexName*Requiredstring
scheme*Requiredstring
type*Requiredobject
createIndexbooleanstring
databasestring
metadataFieldNamesarray
SubTypestring
optionsobject
passwordstring
usernamestring
PGVector Embedding Store
database*Requiredstring
host*Requiredstring
password*Requiredstring
port*Requiredintegerstring
table*Requiredstring
type*Requiredobject
user*Requiredstring
useIndexbooleanstring
Default
falsePinecone Embedding Store
apiKey*Requiredstring
cloud*Requiredstring
index*Requiredstring
region*Requiredstring
type*Requiredobject
namespacestring
Qdrant Embedding Store
apiKey*Requiredstring
collectionName*Requiredstring
host*Requiredstring
port*Requiredintegerstring
type*Requiredobject
Redis Embedding Store
host*Requiredstring
port*Requiredintegerstring
type*Requiredobject
indexNamestring
Default
embedding-indexTablestore Embedding Store
accessKeyId*Requiredstring
accessKeySecret*Requiredstring
endpoint*Requiredstring
instanceName*Requiredstring
type*Requiredobject
metadataSchemaListarray
com.alicloud.openservices.tablestore.model.search.FieldSchema
analyzerstring
Possible Values
SingleWordMaxWordMinWordSplitFuzzyanalyzerParameter
com.alicloud.openservices.tablestore.model.search.analysis.AnalyzerParameter
dateFormatsarray
SubTypestring
enableHighlightingboolean
enableSortAndAggboolean
fieldNamestring
fieldTypestring
Possible Values
LONGDOUBLEBOOLEANKEYWORDTEXTNESTEDGEO_POINTDATEVECTORFUZZY_KEYWORDIPJSONUNKNOWNindexboolean
indexOptionsstring
Possible Values
DOCSFREQSPOSITIONSOFFSETSisArrayboolean
jsonTypestring
Possible Values
FLATTENNESTEDsourceFieldNamesarray
SubTypestring
storeboolean
subFieldSchemasarray
com.alicloud.openservices.tablestore.model.search.FieldSchema
analyzerstring
Possible Values
SingleWordMaxWordMinWordSplitFuzzyanalyzerParameter
dateFormatsarray
SubTypestring
enableHighlightingboolean
enableSortAndAggboolean
fieldNamestring
fieldTypestring
Possible Values
LONGDOUBLEBOOLEANKEYWORDTEXTNESTEDGEO_POINTDATEVECTORFUZZY_KEYWORDIPJSONUNKNOWNindexboolean
indexOptionsstring
Possible Values
DOCSFREQSPOSITIONSOFFSETSisArrayboolean
jsonTypestring
Possible Values
FLATTENNESTEDsourceFieldNamesarray
SubTypestring
storeboolean
subFieldSchemasarray
vectorOptions
vectorOptions
com.alicloud.openservices.tablestore.model.search.vector.VectorOptions
dataTypestring
dimensioninteger
metricTypestring
Possible Values
EUCLIDEANCOSINEDOT_PRODUCTWeaviate Embedding Store
apiKey*Requiredstring
host*Requiredstring
type*Requiredobject
avoidDupsbooleanstring
consistencyLevelstring
Possible Values
ONEQUORUMALLgrpcPortintegerstring
metadataFieldNamestring
metadataKeysarray
SubTypestring
objectClassstring
portintegerstring
schemestring
securedGrpcbooleanstring
useGrpcForInsertsbooleanstring
provider *RequiredNon-dynamic
Definitions
Amazon Bedrock Model Provider
accessKeyId*Requiredstring
modelName*Requiredstring
secretAccessKey*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
modelTypestring
Default
COHEREPossible Values
COHERETITANAnthropic AI Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
maxTokensintegerstring
Azure OpenAI Model Provider
endpoint*Requiredstring
modelName*Requiredstring
type*Requiredobject
apiKeystring
baseUrlstring
caPemstring
clientIdstring
clientPemstring
clientSecretstring
serviceVersionstring
tenantIdstring
DashScope (Qwen) Model Provider from Alibaba Cloud
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://dashscope-intl.aliyuncs.com/api/v1caPemstring
clientPemstring
enableSearchbooleanstring
maxTokensintegerstring
repetitionPenaltynumberstring
Deepseek Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://api.deepseek.com/v1caPemstring
clientPemstring
GitHub Models AI Model Provider
gitHubToken*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
Google Gemini Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
embeddingModelConfiguration
io.kestra.plugin.ai.provider.GoogleGemini-EmbeddingModelConfiguration
maxRetriesintegerstring
outputDimensionalityintegerstring
taskTypestring
Possible Values
RETRIEVAL_QUERYRETRIEVAL_DOCUMENTSEMANTIC_SIMILARITYCLASSIFICATIONCLUSTERINGQUESTION_ANSWERINGFACT_VERIFICATIONtimeoutstring
Format
durationtitleMetadataKeystring
Google VertexAI Model Provider
endpoint*Requiredstring
location*Requiredstring
modelName*Requiredstring
project*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
HuggingFace Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://router.huggingface.co/v1caPemstring
clientPemstring
LocalAI Model Provider
baseUrl*Requiredstring
modelName*Requiredstring
type*Requiredobject
caPemstring
clientPemstring
Mistral AI Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
OciGenAI Model Provider
compartmentId*Requiredstring
modelName*Requiredstring
region*Requiredstring
type*Requiredobject
authProviderstring
baseUrlstring
caPemstring
clientPemstring
Ollama Model Provider
endpoint*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
OpenAI Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://api.openai.com/v1caPemstring
clientPemstring
OpenRouter Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
Watsonx AI Model Provider
apiKey*Requiredstring
modelName*Requiredstring
projectId*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
WorkersAI Model Provider
accountId*Requiredstring
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
caPemstring
clientPemstring
ZhiPu AI Model Provider
apiKey*Requiredstring
modelName*Requiredstring
type*Requiredobject
baseUrlstring
Default
https://open.bigmodel.cn/caPemstring
clientPemstring
maxRetriesintegerstring
maxTokenintegerstring
stopsarray
SubTypestring
documentSplitter Non-dynamic
Definitions
io.kestra.plugin.ai.rag.IngestDocument-DocumentSplitter
maxOverlapSizeInChars*Requiredinteger
maxSegmentSizeInChars*Requiredinteger
splitterstring
Default
RECURSIVEPossible Values
RECURSIVEPARAGRAPHLINESENTENCEWORDdrop booleanstring
Default
falsefromDocuments Non-dynamicarray
Definitions
io.kestra.plugin.ai.rag.IngestDocument-InlineDocument
content*Requiredstring
metadataobject
fromExternalURLs array
SubTypestring
fromInternalURIs array
SubTypestring
fromPath string
metadata object
SubTypestring
Outputs
embeddingStoreOutputs object
ingestedDocuments integer
inputTokenCount integer
outputTokenCount integer
totalTokenCount integer
Metrics
indexed.documents counter
Unit
recordsinput.token.count counter
Unit
tokenoutput.token.count counter
Unit
tokentotal.token.count counter
Unit
token