A Comprehensive Guide to Ollama
Ollama is an exciting and versatile tool designed to help you integrate large language models (LLMs) into your applications seamlessly. Whether you are building custom AI solutions or experimenting with natural language interfaces, Ollama provides a powerful platform for creating, managing, and serving LLM models. This blog will walk you through the core concepts of Ollama, how to get started, and how you can use Ollama effectively in your AI projects.
What is Ollama?
Ollama is a framework designed to make working with large language models simple and intuitive. It is particularly suited for developers who want to experiment with natural language interfaces, build applications that involve LLMs, or create custom AI-powered tools. Ollama allows you to import, modify, and serve models while integrating easily with existing software systems.
Ollama stands out compared to other closed-source APIs due to its flexibility, ease of use, and open approach. Unlike many other solutions, Ollama allows you to host and manage models locally, providing greater control over data privacy and reducing dependence on third-party services. This is especially useful for organizations that prioritize data security or need to comply with specific regulations. Furthermore, Ollama's compatibility with OpenAI's API means that developers can easily switch from proprietary, closed-source platforms to a more customizable, self-hosted solution, while still leveraging the benefits of cutting-edge LLM technology.
Ollama offers:
- Seamless Model Management: Tools to import and modify models.
- API Interface: An API that simplifies interactions with LLMs.
- Docker and Cross-Platform Support: Easy setup using Docker, with support for Linux, Windows, and macOS.
- Data Privacy and Control: Host models locally to ensure data privacy and comply with organizational or regulatory requirements.
- OpenAI Compatibility: Easily migrate existing applications that use OpenAI by switching endpoints, providing a cost-effective and open alternative.
Whether you are building chatbots, data extraction tools, or want to enable intelligent text generation, Ollama can help you scale and manage the LLMs you need.
Getting Started with Ollama
Quick Installation Guide
To get started with Ollama, you first need to install it. Ollama supports different environments, including macOS, Linux, Windows, and Docker.
1. Installation on macOS
To install Ollama on macOS, use the following command:
brew install ollama
2. Installation on Linux
To install Ollama on Linux, you can follow these steps:
- First, update your package index and install prerequisites:
sudo apt update && sudo apt install -y curl unzip
- Download and run the Ollama installation script:
curl -fsSL <https://ollama.com/install.sh> | sh
This script will install the latest version of Ollama on your system.
3. Installation on Windows
On Windows, you can install Ollama using the following command:
choco install ollama
Make sure you have Chocolatey installed on your system before running the command.
4. Running Ollama with Docker
Ollama also supports Docker, allowing you to run it in a containerized environment. To use Docker, first install Docker on your system, and then you can pull the Ollama Docker image:
docker pull ollama/ollama:latest
To run Ollama using Docker:
docker run -p 1144:1144 ollama/ollama:latest
This command will start Ollama and expose it on port 1144.
Installation and Setup Details
The LlamaIndex ecosystem is structured using a collection of namespaced packages, and Ollama follows a similar modular approach for flexibility. Ollama comes with a core starter bundle, and additional integrations can be installed as needed.
A complete list of packages and available integrations is available on LlamaHub.
Quickstart Installation from Pip
To get started quickly, you can install Ollama's core package with:
pip install ollama
This starter bundle includes essential packages for model management and interaction. Note that Ollama may download and store local files for various packages. Use the environment variable "OLLAMA_CACHE_DIR" to control where these files are saved.
Important: Environment Setup
By default, Ollama can integrate with several popular LLMs, including OpenAI models. In order to use OpenAI, you must have an OPENAI_API_KEY
set up as an environment variable. You can obtain an API key by logging into your OpenAI account and creating a new API key.
You can also use other available LLMs. Depending on the LLM provider, additional environment keys and tokens might be required.
Installation from Source
If you prefer to install Ollama from source, you can follow these steps:
- Git clone the repository:
git clone https://github.com/ollama/ollama.git
- Install dependencies using Poetry, which helps manage package dependencies:
poetry shell
poetry install
- (Optional) Install all dependencies needed for local development:
poetry install --with dev, docs
From there, you can install additional integrations as needed with pip:
pip install -e ollama-integrations/llms/ollama-llms-openai
Importing and Managing Models
One of the most powerful features of Ollama is the ability to import and manage LLM models. You can easily import models from various formats, customize them, and use them in your projects.
Importing Models
To import a model, you need to use a Modelfile, which defines the configuration for importing a model. The Modelfile is similar to a Dockerfile, allowing you to specify various settings like model type, preprocessing, and tokenization.
Here is an example of a Modelfile for importing a model:
model "my-custom-model" {
framework = "pytorch"
source = "https://huggingface.co/models/my-custom-model"
}
To import the model, you can run the following command:
ollama import -f Modelfile
Listing Available Models
To list all the available models on your system, use the following command:
ollama list
This command will display a list of all imported models, along with their details.
Using Models
After importing a model, you can start using it right away. You can send queries to the model using the query command:
ollama query my-custom-model "What is the capital of France?"
The model will respond with the answer: "Paris."
Tool Support
Ollama now supports tool calling with popular models such as Llama 3.1. This feature enables models to answer a given prompt using tools they know about, allowing them to perform more complex tasks or interact with the outside world.
Example Tools
- Functions and APIs: Use functions to retrieve real-time data, such as weather or financial information.
- Web Browsing: Search the web for answers that require current information.
- Code Interpreter: Execute code snippets to perform calculations or data processing tasks.
To enable tool calling, provide a list of available tools via the tools
field in Ollama’s API:
import ollama
response = ollama.chat(
model='llama3.1',
messages=[{'role': 'user', 'content': 'What is the weather in Toronto?'}],
tools=[{
'type': 'function',
'function': {
'name': 'get_current_weather',
'description': 'Get the current weather for a city',
'parameters': {
'type': 'object',
'properties': {
'city': {
'type': 'string',
'description': 'The name of the city',
},
},
'required': ['city'],
},
},
}],
)
print(response['message']['tool_calls'])
Supported models will now answer with a tool_calls response, enabling seamless integration of tools for complex queries.
Using the Ollama API
Ollama provides an API that allows developers to interact with models programmatically. The API is compatible with OpenAI's API, making it easier to integrate with existing applications that already use OpenAI models.
Example API Request
You can send a request to the Ollama API to generate text from a model:
curl -X POST "http://localhost:1144/v1/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "my-custom-model",
"prompt": "Explain the process of photosynthesis."
}'
The server will respond with the generated output.
Working with Docker
Running Ollama in Docker provides a portable way to use it in production environments. This makes it easy to deploy models as microservices that can scale with your application's needs.
You can use Docker Compose to create a full deployment setup for Ollama, including configuring persistent storage and exposing necessary ports.
Here is an example Docker Compose file:
version: '3'
services:
ollama:
image: ollama/ollama:latest
ports:
- "1144:1144"
volumes:
- ollama_data:/data
volumes:
ollama_data:
To run the Docker Compose setup, use:
docker-compose up -d
Creating Custom Workflows with Ollama
Ollama also provides tools to build workflows involving multiple LLMs, allowing you to create more complex applications. For example, you can use one model for data extraction and another for summarization.
Here's a simple workflow example:
- Data Extraction: Use a specialized LLM to extract relevant information from a document.
- Summarization: Feed the extracted data into another model to generate a concise summary.
You can script this workflow using the Ollama command-line interface or the API.
OpenAI Compatibility
One of the benefits of Ollama is its compatibility with the OpenAI API, which means that applications built to work with OpenAI can easily be migrated to work with Ollama.
Simply replace the OpenAI endpoint with Ollama's endpoint, and you can start using your existing applications with your custom models without major modifications.
Getting Started Example: Using LangChain with Ollama in Python
Let's imagine we are studying the classics, such as the Odyssey by Homer. We might have a question about Neleus and his family. If you ask Llama2 for that information, you may get something like:
I apologize, but I'm a large language model, I cannot provide information on individuals or families that do not exist in reality. Neleus is not a real person or character, and therefore does not have a family or any other personal details. My apologies for any confusion. Is there anything else I can help you with?
This sounds like a typical censored response, but even Llama2-uncensored gives a mediocre answer:
Neleus was a legendary king of Pylos and the father of Nestor, one of the Argonauts. His mother was Clymene, a sea nymph, while his father was Neptune, the god of the sea.
So let's figure out how we can use LangChain with Ollama to ask our question to the actual document, the Odyssey by Homer, using Python.
Let's start by asking a simple question that we can get an answer to from the Llama2 model using Ollama. First, we need to install the LangChain package:
pip install langchain_community
Then we can create a model and ask the question:
from langchain_community.llms import Ollama
ollama = Ollama(
base_url='http://localhost:11434',
model="llama3"
)
print(ollama.invoke("why is the sky blue"))
Notice that we are defining the model and the base URL for Ollama.
Now let's load a document to ask questions against. I'll load up the Odyssey by Homer, which you can find at Project Gutenberg. We will need WebBaseLoader which is part of LangChain and loads text from any webpage. On my machine, I also needed to install bs4
to get that to work, so run pip install bs4
.
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.gutenberg.org/files/1727/1727-h/1727-h.htm")
data = loader.load()
This file is pretty big. Just the preface is 3000 tokens. Which means the full document won't fit into the context for the model. So we need to split it up into smaller pieces.
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
It's split up, but we have to find the relevant splits and then submit those to the model. We can do this by creating embeddings and storing them in a vector database. We can use Ollama directly to instantiate an embedding model. We will use ChromaDB in this example for a vector database. Run pip install chromadb. We also need to pull embedding model: ollama pull nomic-embed-text.
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)
Now let's ask a question from the document. Who was Neleus, and who is in his family? Neleus is a character in the Odyssey, and the answer can be found in our text.
question = "Who is Neleus and who is in Neleus' family?"
docs = vectorstore.similarity_search(question)
len(docs)
This will output the number of matches for chunks of data similar to the search.
The next thing is to send the question and the relevant parts of the docs to the model to see if we can get a good answer. But we are stitching two parts of the process together, and that is called a chain. This means we need to define a chain:
from langchain.chains import RetrievalQA
qachain = RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())
res = qachain.invoke({"query": question})
print(res['result'])
The answer received from this chain was:
Neleus is a character in Homer's "Odyssey" and is mentioned in the context of Penelope's suitors. Neleus is the father of Chloris, who is married to Neleus and bears him several children, including Nestor, Chromius, Periclymenus, and Pero. Amphinomus, the son of Nisus, is also mentioned as a suitor of Penelope and is known for his good natural disposition and agreeable conversation.
It's not a perfect answer, as it implies Neleus married his daughter when actually Chloris "was the youngest daughter to Amphion son of Iasus and king of Minyan Orchomenus, and was Queen in Pylos".
I updated the chunk_overlap for the text splitter to 20 and tried again and got a much better answer:
Neleus is a character in Homer's epic poem "The Odyssey." He is the husband of Chloris, who is the youngest daughter of Amphion son of Iasus and king of Minyan Orchomenus. Neleus has several children with Chloris, including Nestor, Chromius, Periclymenus, and Pero.
And that is a much better answer.
Troubleshooting and Community Support
If you run into issues with Ollama, there is a comprehensive troubleshooting guide available, covering common problems and solutions. The community is also very active on platforms like Discord, making it easy to ask questions and get help.
To join the community or find more resources, visit:
Final Thoughts
Ollama is a powerful tool for developers who want to integrate LLMs into their applications seamlessly. With support for Docker, easy model import, OpenAI compatibility, tool calling features, and an intuitive API, Ollama provides all the tools you need to get up and running with LLMs. Whether you're building chatbots, data extraction workflows, or deploying models as microservices, Ollama makes it easy to create context-augmented applications that take full advantage of language models.
Get started with Ollama today, and take your AI projects to the next level.
Cohorte Team
November 13, 2024