Engineering25 min read

Part 2: Ollama Advanced Use Cases and Integrations

Ollama isn’t just for local AI tinkering. It can be a powerful piece of a larger system—integrating with Open WebUI for a sleek interface, LiteLLM for API unification, and frameworks like LangChain for advanced workflows. In this deep dive, we explore how to extend Ollama beyond the basics, from fine-tuning custom models to real-world production setups. If you’ve been running models locally but want more control, scalability, and integration, this is for you.

Tega Adeyemi
Tega Adeyemi
Part 2: Ollama Advanced Use Cases and Integrations

In the first part of this series, we covered the basics of using Ollama to run large language models locally. Now, we will take a deeper dive into Ollama’s more advanced features and real-world integrations. Ollama isn’t just a toy for local experimentation; it offers capabilities that can be extended into production-like scenarios and combined with other tools in the AI ecosystem. In this article, we’ll explore some of the advanced use cases that Ollama enables, discuss what it means to use Ollama in production settings, and look at how it integrates with frameworks such as Open WebUI and LiteLLM. We’ll also highlight real-world examples and provide code snippets to illustrate these integrations in practice.

Stitching these integrations into something a real team can run on Monday is the focus of Cohorte's AI Engineering Foundations course (E1).

While Ollama’s core function is running LLMs locally, its value grows when you start to use it as part of a larger system. Whether you want a sleek web interface for your local models, or you wish to blend local and cloud AI services, Ollama can often be a central piece of the puzzle. Let’s explore these aspects step by step.

Exploring Ollama’s Advanced Features

Beyond the basic commands to pull and run models, Ollama offers advanced functionality that can significantly enhance your AI workflow:

In summary, Ollama’s advanced features make it much more than a basic CLI for running models. It is evolving into a full-fledged local AI platform with support for custom models, structured outputs, tool use, and more. These capabilities are particularly useful for developers who want to build applications on top of Ollama, which brings us to using Ollama in production scenarios.

Using Ollama in Production

One common question is: can Ollama be used in a production environment? The answer is nuanced. Ollama is primarily designed for local development and experimentation, and the maintainers caution that it’s not originally intended for high-load production use (for example, the documentation notes that the API is not meant for heavy production usage)​. However, that doesn’t mean it can’t be part of a production workflow in the right circumstances. Many users have successfully deployed Ollama in controlled production scenarios, especially when serving a limited number of users or using it as an internal service.

Considerations for using Ollama in production:

It’s worth noting that some users on forums have reported success using Ollama in production for their specific needs, often citing that if you already have the hardware, it’s a cost-effective solution​. The key is to understand the constraints: Ollama shines in controlled, perhaps smaller-scale environments, and might struggle or require extra engineering for high-scale cloud deployment. In those latter cases, you might treat Ollama as a stepping stone – it proves out your concept locally, and if you need to scale up dramatically, you could transition to a more scalable serving stack later (the skills and code you develop with Ollama will still transfer quite well).

In summary, using Ollama in production is possible and practical for certain cases, especially when data privacy is paramount and scale is moderate. You should run thorough tests, monitor performance, and be prepared to tune the setup (in terms of concurrency and memory) to ensure it meets your production requirements.

Integrations with Open WebUI, LiteLLM, and Other Frameworks

One of the great things about Ollama is that it can integrate with other tools to provide better user experiences or broader functionality. Let’s discuss a few notable integrations:

Open WebUI – A GUI for Your Local LLMs

Open WebUI (sometimes called Ollama WebUI) is an open-source web interface that works with local LLM backends like Ollama​. If you prefer chatting with your model through a browser or want a shareable interface for non-technical users, Open WebUI is ideal. Essentially, Open WebUI acts as a frontend, providing a ChatGPT-like chat experience, while Ollama runs in the backend serving the model.

How it works: Open WebUI runs as a web application (you can launch it via Docker very easily) and connects to the Ollama API. It supports multiple backends, but in our case we’d use the Ollama backend. Once set up, you access a local website (e.g., http://localhost:3000) where you can log in, select available models, and start chatting.

Setup: Assuming you have Docker installed, getting Open WebUI running is often a one-liner. For example, from a dev community guide, you can run:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
    -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

This command pulls and starts the Open WebUI container, mapping it to port 3000​. The --add-host=host.docker.internal:host-gateway part ensures that the Docker container can communicate with Ollama running on the host machine (by resolving host.docker.internal to the host’s IP). Once Open WebUI is up, you typically create an account (just a local login for the interface) and then you can configure models.

Connecting to Ollama: Open WebUI needs to know how to talk to Ollama. Usually, in its settings, you’d specify the base URL of the Ollama API (by default http://host.docker.internal:11434 if using the above Docker setup on localhost). The nice thing is Open WebUI by design supports Ollama out of the box, so it may automatically detect the local service if networked correctly. In some configurations, you might set an environment variable or a config in docker-compose that points Open WebUI to use Ollama. For example, setting OPENAI_API_BASE_URL to the Ollama server and an OPENAI_API_KEY dummy value (since Ollama doesn’t need a key for local calls)​. This essentially tricks the WebUI to use Ollama as if it were the OpenAI API.

Once connected, any model you have pulled in Ollama becomes available in the web UI. Open WebUI even provides features like a model selection dropdown, prompt presets, and conversation saving. It’s a much richer experience for chatting compared to the raw terminal. You get things like message history in a nice format, the ability to edit or retry prompts, etc.

Use case: Imagine you have colleagues who aren’t comfortable with CLI. You can set up Open WebUI on a machine with Ollama and they can access a chat interface through their browser to use the models you’ve hosted. This could be on your local machine or a shared server. It’s a quick way to create a private ChatGPT alternative in your network, with a user-friendly interface.

Additionally, Open WebUI allows connecting to cloud APIs too (like an OpenAI API key), and you can switch between local and cloud models in one interface​. This integration of local & cloud gives flexibility: for some tasks you use your private local model, for others maybe a call to GPT-4, all from one place.

LiteLLM – Bridging APIs and Hybrid Workflows

LiteLLM is another tool that often comes up alongside Ollama and Open WebUI. LiteLLM acts as a proxy that provides an OpenAI-compatible API on one side and translates calls to various backends on the other (be it Ollama, Azure OpenAI, Amazon Bedrock, etc.)​. It’s essentially a middleware that can make different AI services look like the standard OpenAI API.

In practice, LiteLLM is useful if you want to integrate multiple AI providers in one interface. For example, in Open WebUI, if you want to use both your local Ollama model and a model deployed on Azure’s OpenAI service, you can’t directly call Azure OpenAI from Open WebUI because it expects the OpenAI API spec. LiteLLM can sit in between: Open WebUI sends a request to LiteLLM (which it thinks is “OpenAI”), and LiteLLM routes it to Azure’s endpoint (with the appropriate changes in endpoint URL, keys, etc.), then returns the result back in the standard format​. Similarly, LiteLLM can route to Ollama itself or other sources.

For hybrid usage, LiteLLM allows scenarios like: “If the user selects GPT-4 (which we have via Azure) in the UI, use that; if they select Llama-2 (which is local via Ollama), use that.” Both appear through a unified API. This is powerful in enterprise contexts where you might have some models in cloud and some on-prem, and want one interface to access all.

Integrating LiteLLM: In an Ollama + Open WebUI setup, you’d deploy LiteLLM (it can be a small server or container) configured with the routes to your desired endpoints. For instance, configure it so that requests with model name “azure/gpt4” go to Azure OpenAI, and requests with model name “ollama/llama2” go to your local Ollama. Open WebUI would then point to LiteLLM as its backend (instead of directly to Ollama). A lot of this can be orchestrated with Docker Compose: you’d have one service for Open WebUI, one for LiteLLM, and maybe one for Ollama, all networked together. In fact, a Docker Compose example from a blog shows how they linked Open WebUI and LiteLLM, setting OPENAI_API_BASE_URL to the LiteLLM service and an extra_hosts entry so that the WebUI container can find the Ollama host machine​.

The end result is a flexible AI stack: Open WebUI for UI, LiteLLM for intelligent routing, and Ollama (plus possibly others) for actual model serving​. This setup means users can switch between models (local or cloud) seamlessly. For example, you might primarily use the local model to save costs, but if it fails or if you need a second opinion, you switch to a cloud model via the same interface.

Other Frameworks and Integrations

Beyond Open WebUI and LiteLLM, Ollama can integrate with numerous other tools and frameworks:

import openai
openai.api_base = "http://localhost:11434"  # Ollama's default API endpoint
openai.api_key = "ignored"  # Ollama doesn't require a key, but the client needs one set
response = openai.ChatCompletion.create(
    model="llama2",
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response["choices"][0]["message"]["content"])

The bottom line is that Ollama is quite interoperable. Its design to have a standard API makes it a plug-and-play component in many systems. Whether you want a nicer interface (Open WebUI), a combination of AI sources (via LiteLLM), or integration into coding frameworks and applications (SDKs, LangChain), there’s likely a way to use Ollama for it. This versatility means you can start locally in your terminal, but eventually use the same local model in a web app or a custom tool without completely changing your setup.

Code Snippet: Using Ollama with a Web UI (Integration Example)

To illustrate one integration, here’s a small snippet of how you might use Open WebUI with Ollama using Docker Compose (a hypothetical example combining services):

# docker-compose.yml
version: '3'
services:
  ollama:
    image: ollama/ollama:latest  # assume an Ollama docker image
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama  # persist models
    command: ollama serve

  webui:
    image: ghcr.io/open-webui/open-webui:main
    depends_on:
      - ollama
    ports:
      - "3000:8080"
    environment:
      OPENAI_API_BASE_URL: "http://ollama:11434"    # point WebUI to Ollama service
      OPENAI_API_KEY: "not_used_but_required"
      WEBUI_AUTH: "false"  # disable auth for simplicity
    extra_hosts:
      - "ollama:127.0.0.1"  # ensure the container can resolve the name (if needed)

In this configuration, we have two services: ollama and webui. The WebUI is configured to talk to the Ollama’s API. By running docker-compose up, you would get a local Ollama server and the Open WebUI all set up together. Then you can open http://localhost:3000 and use the interface, which will send your queries to the Ollama model. (In practice, ensure the network names/addresses are correct; this is a simplified example.)

Code Snippet: Hybrid API via LiteLLM (Integration Example)

Another snippet demonstrating LiteLLM configuration (pseudo-code) might look like:

litellm:
  image: litellm/proxy:latest
  ports:
    - "8000:8000"
  volumes:
    - ./litellm_config.yml:/app/config.yml
  environment:
    LITELLM_CONFIG: "/app/config.yml"

And in litellm_config.yml, configure routes:

providers:
  azure_openai:
    type: "azure_openai"
    api_base: "https://your-azure-endpoint.openai.azure.com/"
    api_version: "2023-05-15"
    api_key: "AZURE_API_KEY"
  ollama_local:
    type: "ollama"
    base_url: "http://host.docker.internal:11434"

routes:
  - path: "/v1/chat/completions"
    # If model name starts with "azure:" route to Azure, else to Ollama
    target: "azure_openai" if model.startswith("azure:") else "ollama_local"

With such a config, a request coming in to LiteLLM (at localhost:8000/v1/chat/completions) with model "azure:gpt-35-turbo" will be sent to Azure’s OpenAI, whereas model "llama2" will be sent to the local Ollama. Open WebUI or any client can simply use http://localhost:8000 as if it were the OpenAI API, and LiteLLM handles dispatching to the correct backend. This example is illustrative; actual LiteLLM config syntax might differ, but it conveys the concept of conditional routing.

Real-World Case Studies and Examples

To ground our discussion, let’s look at a few real-world examples where Ollama and its integrations have been employed:

Each of these examples underscores a different aspect of Ollama’s versatility: internal Q&A bots, research tooling, hybrid cloud-offloading strategies, IDE integration, and even edge deployment. The common theme is leveraging local models for privacy, cost, or offline reasons, and using Ollama’s integrations to make the experience more user-friendly or to combine with other systems.

One more notable case: Google’s Firebase GenAI Kit announced support for Ollama​. This means developers using Google’s tooling for AI app deployment could incorporate Ollama as a backend. It’s a strong signal that even major platforms see the value in local-first AI solutions for production apps, likely for scenarios where developers want an option to run open models.

Code Snippets for Integrations

We’ve sprinkled some code examples above, but let’s add a couple more brief snippets that illustrate integration in code:

Using the Ollama Python SDK (Advanced Integration):

For a Python application, instead of calling HTTP endpoints directly, you can use the official SDK:

# Install via: pip install ollama
from ollama import Client

client = Client(base_url="http://localhost:11434")  # assume Ollama is running

# Example: streaming chat completion using a local model
messages = [
    {"role": "system", "content": "You are a helpful travel assistant."},
    {"role": "user", "content": "Suggest a 1-week itinerary for Japan."}
]
for response in client.chat_stream(model="llama2", messages=messages):
    # Each response is a chunk of the streaming answer
    chunk_text = response.message.content if response.message else ""
    print(chunk_text, end="", flush=True)

This uses Ollama’s Python client to send a chat prompt and stream back the result chunk by chunk (useful for showing incremental output in a UI)​. The SDK handles constructing the HTTP requests under the hood. Note that the first message defines a role (system) which sets context, showing that the API supports role-based messages similar to OpenAI’s ChatCompletion format​. This is very developer-friendly for advanced apps.

Integration with LangChain (pseudo-code):

from langchain.llms import OpenAI

# Point LangChain's OpenAI wrapper to Ollama
import os
os.environ["OPENAI_API_BASE"] = "http://localhost:11434"
os.environ["OPENAI_API_KEY"] = "something"  # dummy

llm = OpenAI(model_name="mistral")  # LangChain will call our local model
prompt = "Q: What is 5+7?\nA:"
result = llm(prompt)
print(result)

LangChain’s OpenAI class will internally use the openai package which reads the environment variables. In this way, LangChain thinks it’s talking to OpenAI but actually it’s querying Ollama. Advanced chains (with tools, memory) can be built on top of this llm object, enabling powerful workflows entirely with a local model.

Final Thoughts

Ollama’s advanced use cases and integrations demonstrate how a local LLM runner can be embedded into larger systems. From providing a user-friendly interface via Open WebUI to acting as a component in a sophisticated multi-provider setup with LiteLLM, Ollama proves to be highly adaptable. We discussed that while Ollama isn’t tailored for massive-scale production out-of-the-box, it certainly can be utilized in production-like environments where its strengths (privacy, control, cost saving) shine, as long as one is mindful of its limitations.

The integrative capability means you don’t have to use Ollama in isolation. If you need a feature it doesn’t have (like a UI or a certain cloud model), odds are you can combine it with another tool to get the best of both worlds. The growing ecosystem – including official SDKs, community UIs, and third-party frameworks – is turning Ollama into a cornerstone of local AI development.

In conclusion, advanced users can leverage Ollama to build real applications, not just experiments. Whether it’s an internal chatbot, a development assistant, or a component of a hybrid cloud solution, Ollama provides the local inference engine that powers it. In the next part of this series, we will focus specifically on using Ollama for AI model serving: how to set up Ollama as a persistent service, optimize it for performance, and apply it to serve AI models as a backend for applications.

Tega AdeyemiMarch 18, 2025