Source
yaml
id: chat-with-your-data
namespace: company.team
inputs:
- id: question
type: STRING
defaults: How do I join the course after it has started?
- id: select_a_zoomcamp
type: SELECT
defaults: data-engineering-zoomcamp
values:
- data-engineering-zoomcamp
- machine-learning-zoomcamp
- data-science-zoomcamp
tasks:
- id: search
type: io.kestra.plugin.elasticsearch.Search
connection:
hosts:
- http://localhost:9200/
indexes:
- course_questions
request:
size: 5
query:
bool:
must:
multi_match:
query: "{{ inputs.question }}"
fields:
- question
- text
- section
type: best_fields
filter:
term:
course: "{{ inputs.select_a_zoomcamp }}"
- id: context_template
type: io.kestra.plugin.core.debug.Return
format: |
{% for row in outputs.search.rows %}
Section: {{ row.section }}
Question: {{ row.question }}
Text: {{ row.text }}
{% endfor %}
- id: generate_response
type: io.kestra.plugin.openai.ChatCompletion
apiKey: sk-proj-your-OpenAI-API-KEY
model: gpt-4o
maxTokens: 500
prompt: >
You're a course teaching assistant.
Answer the user QUESTION based on CONTEXT - the documents retrieved from
our FAQ database.
Only use the facts from the CONTEXT.
If the CONTEXT doesn't contain the answer, return "NONE". QUESTION: {{
inputs.question }} CONTEXT: {{ outputs.context_template.value }}
- id: log_output
type: io.kestra.plugin.core.log.Log
message: "{{ outputs.generate_response.choices | jq('.[].message.content') |
first }}"
About this blueprint
Variables AI API
This flow demonstrates how to use the OpenAI plugin to chat with your ElasticSearch data. The flow will do the following:
- Search an ElasticSearch index for relevant documents based on a user's question.
- Parse the Elasticsearch query results into a context template used in a prompt to the LLM.
- Generate a response to the user's question based on the retrieved documents using the OpenAI API plugin.
- Log the response to the console. The flow uses . To set up the Elasticsearch database locally with Docker, use the following command:
bash
--rm \
--name elasticsearch \
-m 2G \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.14.3
Note that if you see a
"Elasticsearch has quit unexpectedly"
message, you can use the-m
field to increase the memory limit of your container. To create a samplecourse_questions
index, use the following command:
bash
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"text": { "type": "text" },
"section": { "type": "text" },
"question": { "type": "text" },
"course": { "type": "keyword" }
}
}
}' ```
Once your sample index is created, load data into it. We'll use a publicly available [DataTalksClub FAQ dataset](https://raw.githubusercontent.com/alexeygrigorev/llm-rag-workshop/main/notebooks/documents.json):
```bash curl -X POST "http://localhost:9200/course_questions/_bulk" \ -H "Content-Type: application/json" \ --data-binary @<(curl -s https://huggingface.co/datasets/kestra/datasets/raw/main/json/zoomcamp_faq.json | jq -c '.[] | {"index":{}}, .') ```
To get access to the OpenAI API key:
- Create an [OpenAI](https://platform.openai.com/) account - Go to the [API keys](https://platform.openai.com/api-keys) section - Create a new key and copy it.
**Big thanks to [Faithful Adeda](https://github.com/kestra-io/kestra/issues/4608) for contributing this blueprint! 🫶**