Creating a simple chatbot with Langchain and Ollama
date: 06-05-2024
What is Retrieval-Augmented Generation (RAG) ?
Retrieval-Augmented Generation, or RAG, is this super cool technique that supercharges large language models (LLMs) like the ones we use in AI chatbots. Basically, it makes AI responses smarter by pulling in fresh, relevant info from external sources right when you need it.
Why RAG Rocks for LLMS !
Here's why you might want to consider using RAG if you're messing around with AI models:
- Keeps Things Fresh: RAG enables LLMs to pull in the latest data beyond their training datasets, diminishing the likelihood of delivering outdated or inaccurate responses.
- Keeps it Real: Instead of making up answers (yeah, AI does that sometimes), RAG helps keep the AI’s responses grounded in actual, factual data.
- Easy to Implement: You don't need to be a coding wizard to get RAG rolling, and it doesn't cost an arm and a leg to update your AI model.
In short, RAG is like giving your AI a mini-upgrade with each query, ensuring it's always on its A-game when answering questions or helping out users. It's a game-changer for making AI interactions a lot more reliable and useful.
Installation of the tools: Ollama, Langchain
Let's play a bit with python and Local LLMs. We will create a simple python application that will use Ollama and Langchain to create your custom chatbot.
Creating the python project
First, we will use poetry
to create our python project.
Use the following command to initialize your project:
poetry new ai-chatbot-rag
Next, let's understand the tools we will use for our application. Let's start by Ollama !
Ollama
Ollama is the easiest solution to run LLM locally. It's as simple as running docker pull
. To get started with Ollama, you can download it on the official website
To use a local LLM with ollama, simply use ollama pull <model-name>
. If we want to use the new LLAMA 3 model, we can run the following command :
ollama pull llama3
When the model download is complete, you can use the model with the following command:
ollama run llama3
You can also serve an api endpoint and this is what we will use for our chatbot !.
To do this, run the following command:
ollama serve
Langchain
To supercharge our AI app and make things easier, we will use the Langchain framework.
This framework have some nice abstraction that will make it easier to create our RAG application.
Getting Started with Langchain
Here's the quick guide on how to getting Langchain up and running.
- Installation: Since we already have our project initialized, let's add the Langchain Library ! In your python project, use the following command:
poetry add langchain
This command will add the Langchain library inside your project.
Simple Chatbot using Langchain & Ollama
Let's create a simple Chatbot
First, we need to import the what we need from Langchain
from langchain_community.llms import Ollama
from langchain_core.prompts import PromptTemplate
We will use environment variable to make our application more flexible. Let's install python-dotenv
poetry add python-dotenv
Create a .env and add your local ip
OLLAMA_BASE_URL="http://<local-ip>:11434"
We can use the environment variable with the following code :
import os
from dotenv import load_dotenv
load_dotenv()
# Define the base url for ollama
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL")
Now let's create the LLM and the prompt.
With the code below, we can have a simple chatbot with a custom System prompt.
llm = Ollama(model="llama3", base_url=OLLAMA_BASE_URL)
# Create the prompt template
prompt_template = """<|system|You are an helpful Assistant. Your goal is to answer the user as best as you can<|end|>
<|user|>What is the capital of france?<|end|><|assistant|>
The capital of France is Paris<|end|>
<|user|>{question}<|end|><|assistant|>
"""
prompt = PromptTemplate.from_template(prompt_template)
chain = prompt | llm
while True:
question = input("Ask me anything: ")
chunks = []
for chunk in chain.stream({question: question}):
print(chunk, end="")
chunks.append(chunk)
The full code woul be the following :
from langchain_community.llms import Ollama
from langchain_core.prompts import PromptTemplate
import os
from dotenv import load_dotenv
load_dotenv()
# Define the base url for ollama
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL")
llm = Ollama(model="phi3", base_url=OLLAMA_BASE_URL)
# Create the prompt template
prompt_template = """<|system|You are an helpful Assistant. Your goal is to answer the user as best as you can<|end|>
<|user|>What is the capital of france?<|end|><|assistant|>
The capital of France is Paris<|end|>
<|user|>{question}<|end|><|assistant|>
"""
prompt = PromptTemplate.from_template(prompt_template)
chain = prompt | llm
while True:
question = input("Ask me anything: ")
chunks = []
for chunk in chain.stream({question: question}):
print(chunk, end="")
chunks.append(chunk)