Chat with your Elasticsearch data using the OpenAI plugin (RAG)

Source

yaml

id: chat-with-your-data
namespace: company.team

inputs:
  - id: question
    type: STRING
    defaults: How do I join the course after it has started?
  - id: select_a_zoomcamp
    type: SELECT
    defaults: data-engineering-zoomcamp
    values:
      - data-engineering-zoomcamp
      - machine-learning-zoomcamp
      - data-science-zoomcamp

tasks:
  - id: search
    type: io.kestra.plugin.elasticsearch.Search
    connection:
      hosts:
        - http://localhost:9200/
    indexes:
      - course_questions
    request:
      size: 5
      query:
        bool:
          must:
            multi_match:
              query: "{{ inputs.question }}"
              fields:
                - question
                - text
                - section
              type: best_fields
          filter:
            term:
              course: "{{ inputs.select_a_zoomcamp }}"

  - id: context_template
    type: io.kestra.plugin.core.debug.Return
    format: |
      {% for row in outputs.search.rows %}
        Section: {{ row.section }}
        Question: {{ row.question }}
        Text: {{ row.text }}
      {% endfor %}

  - id: generate_response
    type: io.kestra.plugin.openai.ChatCompletion
    apiKey: sk-proj-your-OpenAI-API-KEY
    model: gpt-4o
    maxTokens: 500
    prompt: |
      You're a course teaching assistant. 

      Answer the user QUESTION based on CONTEXT - the documents retrieved from
      our FAQ database. 

      Only use the facts from the CONTEXT. 

      If the CONTEXT doesn't contain the answer, return "NONE".
      QUESTION: {{ inputs.question }}
      CONTEXT: {{ outputs.context_template.value }}

  - id: log_output
    type: io.kestra.plugin.core.log.Log
    message: "{{ outputs.generate_response.choices | jq('.[].message.content') |
      first }}"

About this blueprint

AI API Kestra

This flow demonstrates how to use the OpenAI plugin to chat with your ElasticSearch data.

The flow will do the following:

Search an ElasticSearch index for relevant documents based on a user's question.
Parse the Elasticsearch query results into a context template used in a prompt to the LLM.
Generate a response to the user's question based on the retrieved documents using the OpenAI API plugin.
Log the response to the console.

The flow uses .

To set up the Elasticsearch database locally with Docker, use the following command:

bash

docker run -it \
    --rm \
    --name elasticsearch \
    -m 2G \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.14.3

Note that if you see a "Elasticsearch has quit unexpectedly" message, you can use the -m field to increase the memory limit of your container.

To create a sample course_questions index, use the following command:

bash

curl -X PUT "http://localhost:9200/course_questions" -H "Content-Type:
application/json" -d'

{
 "settings": {
   "number_of_shards": 1,
   "number_of_replicas": 0
 },
 "mappings": {
   "properties": {
     "text": { "type": "text" },
     "section": { "type": "text" },
     "question": { "type": "text" },
     "course": { "type": "keyword" }
   }
 }
}'

Once your sample index is created, load data into it. We'll use a publicly available DataTalksClub FAQ dataset:

bash

curl -X POST "http://localhost:9200/course_questions/_bulk" \
-H "Content-Type: application/json" \
--data-binary @<(curl -s
https://huggingface.co/datasets/kestra/datasets/raw/main/json/zoomcamp_faq.json
| jq -c '.[] | {"index":{}}, .')

To get access to the OpenAI API key:

Create an OpenAI account
Go to the API keys section
Create a new key and copy it.

Big thanks to Faithful Adeda for contributing this blueprint! 🫶

Return

Chat Completion

Log

New to Kestra?

Use blueprints to kickstart your first workflows.

Get started with Kestra