About this blueprint
AI API Variables
This flow demonstrates how to use the OpenAI plugin to chat with your ElasticSearch data. The flow will do the following:
- Search an ElasticSearch index for relevant documents based on a user's question.
- Parse the Elasticsearch query results into a context template used in a prompt to the LLM.
- Generate a response to the user's question based on the retrieved documents using the OpenAI API plugin.
- Log the response to the console.
The flow uses .
To set up the Elasticsearch database locally with Docker, use the following command:
bash
docker run -it \
--rm \
--name elasticsearch \
-m 2G \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.14.3
Note that if you see a
"Elasticsearch has quit unexpectedly"
message, you can use the-m
field to increase the memory limit of your container.
To create a sample course_questions
index, use the following command:
bash
curl -X PUT "http://localhost:9200/course_questions" -H "Content-Type: application/json" -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"text": { "type": "text" },
"section": { "type": "text" },
"question": { "type": "text" },
"course": { "type": "keyword" }
}
}
}'
Once your sample index is created, load data into it. We'll use a publicly available DataTalksClub FAQ dataset:
bash
curl -X POST "http://localhost:9200/course_questions/_bulk" \
-H "Content-Type: application/json" \
--data-binary @<(curl -s https://huggingface.co/datasets/kestra/datasets/raw/main/json/zoomcamp_faq.json | jq -c '.[] | {"index":{}}, .')
To get access to the OpenAI API key:
Big thanks to Faithful Adeda for contributing this blueprint! 🫶
yaml
id: chat_with_your_data
namespace: company.team
inputs:
- id: question
type: STRING
defaults: How do I join the course after it has started?
- id: select_a_zoomcamp
type: SELECT
defaults: data-engineering-zoomcamp
values:
- data-engineering-zoomcamp
- machine-learning-zoomcamp
- data-science-zoomcamp
tasks:
- id: search
type: io.kestra.plugin.elasticsearch.Search
connection:
hosts:
- http://localhost:9200/
indexes:
- course_questions
request:
size: 5
query:
bool:
must:
multi_match:
query: "{{ inputs.question }}"
fields: ["question", "text", "section"]
type: best_fields
filter:
term:
course: "{{ inputs.select_a_zoomcamp }}"
- id: context_template
type: io.kestra.plugin.core.debug.Return
format: >
{% for row in outputs.search.rows %}
Section: {{ row.section }}
Question: {{ row.question }}
Text: {{ row.text }}
{% endfor %}
- id: generate_response
type: io.kestra.plugin.openai.ChatCompletion
apiKey: sk-proj-your-OpenAI-API-KEY
model: gpt-4o
maxTokens: 500
prompt: |
You're a course teaching assistant.
Answer the user QUESTION based on CONTEXT - the documents retrieved from our FAQ database.
Only use the facts from the CONTEXT.
If the CONTEXT doesn't contain the answer, return "NONE".
QUESTION: {{ inputs.question }}
CONTEXT: {{ outputs.context_template.value }}
- id: log_output
type: io.kestra.plugin.core.log.Log
message: "{{ outputs.generate_response.choices | jq('.[].message.content') | first }}"