Engineering6 min read

DSPy, De-Risked: A Practical Guide to LLM System Programming & Auto-Optimisation.

Build sharper LLM systems with DSPy: tool-using agents, real metrics, and MIPROv2 auto-optimisation—actionable code, faster results, 2025 ready.

Tega Adeyemi
Tega Adeyemi
DSPy, De-Risked: A Practical Guide to LLM System Programming & Auto-Optimisation.

Why this post and how to use it

We wrote this as a no-nonsense, developer-first guide to DSPy—the framework for programming LLM systems (not just prompting). You’ll get:

Everything here was checked against current DSPy docs and examples. We cite the exact pages beneath each major section so you can jump into the source.

TL;DR (Key Takeaways)

1) The DSPy mental model (10 seconds)

2) Quick start you won’t have to rewrite later

Secure key handling first—avoid hard-coding secrets.

pip install -U dspy
export OPENAI_API_KEY=...  # or your provider key
# dspystart.py
import dspy

# 1) Configure the LM (respect env var for keys)
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)  # can also pass tracing config here later

This is the current pattern in DSPy quickstarts/tutorials.

3) Hello, program: signatures, modules, and a baseline.

import dspy
from dspy import Signature, Predict

class AnswerQuestion(Signature):
    """Short factual QA."""
    question: str
    answer: str

qa = Predict(AnswerQuestion)

print(qa(question="Who founded Stanford?").answer)

Evaluate it so you have a baseline:

import dspy
from dspy.evaluate import answer_exact_match

devset = [
    dict(question="Capital of France?", answer="Paris"),
    dict(question="2+2?", answer="4"),
]

evaluate = dspy.Evaluate(devset=devset, metric=answer_exact_match)
score = evaluate(qa)
print("Baseline score:", score)

Evaluate + off-the-shelf metrics (exact match, semantic F1) are first-class.

4) Turn it into a tool-using agent (ReAct) the correct way

Rules that save hours:

from typing import List
import dspy
from dspy import Signature

# 1) Define your task signature
class SearchAndAnswer(Signature):
    query: str
    answer: str

# 2) Define tools as plain functions with type hints
def search_web(query: str, k: int = 5) -> str:
    """Return concatenated snippets for top-k search results."""
    # call your search service here
    return "snippet 1 ... snippet 2 ..."

def summarize(text: str, max_words: int = 80) -> str:
    return text.split(".")[0][:max_words]

# 3) Build a ReAct agent with tools
agent = dspy.ReAct(SearchAndAnswer, tools=[search_web, summarize], max_iters=6)

print(agent(query="What is DSPy and why use it?").answer)

This aligns with the official ReAct module signature: dspy.ReAct(signature, tools: list[Callable], max_iters=10).

Note: DSPy doesn’t ship a built-in BM25 retriever. If you want BM25, wrap your retriever as a tool (like search_web) or build a small RAG stage and expose it via a function.

5) RAG, briefly—DSPy style

You can write a tiny RAG pipeline—split/index with your favorite library, fetch top-k passages, and expose a retrieve(query) tool for ReAct. The official tutorials show RAG patterns that are easy to adapt.

def retrieve(query: str, k: int = 6) -> str:
    """Return top-k passages from your store."""
    # e.g., call into FAISS/Chroma/Elasticsearch/etc.
    return "passage A\n\npassage B\n\n..."

agent = dspy.ReAct(SearchAndAnswer, tools=[retrieve, summarize], max_iters=6)

6) Make it measurable: metrics that matter

You can swap answer_exact_match for semantic metrics (SemanticF1) when exact strings don’t tell the full story. Write custom metrics as standard Python callables—DSPy encourages it.

from dspy.evaluate import SemanticF1

evaluate = dspy.Evaluate(devset=devset, metric=SemanticF1())
print("SemF1 score:", evaluate(agent))

7) Auto-optimize with MIPROv2 (instructions + few-shots)

Let DSPy mutate instructions/few-shots to improve your metric—objectively.

from dspy.teleprompt import MIPROv2
from dspy.evaluate import answer_exact_match

# our base "student" program
student = Predict(AnswerQuestion)

# the optimizer
teleprompter = MIPROv2(
    metric=answer_exact_match,   # or your custom callable
    auto="medium"                # search effort
)

optimized = teleprompter.compile(student=student, trainset=devset)

# Persist and reload
optimized.save("optimized.json")
better = dspy.load("optimized.json")

8) Production notes (the opinions we earned the hard way)

9) Comparisons (where DSPy fits)

(We’re careful here: these are different abstractions; “seamless” ≠ “same.”)

10) Common mistakes & how to avoid them

  1. Passing objects as tools
    • Fix: Use plain functions with type hints; pass them directly to ReAct.
  2. No metric, no dev set
    • You’ll fly blind and overfit anecdotes. Use Evaluate with a simple metric today; refine later.
  3. Believing there’s a built-in retriever
    • There isn’t. Wrap your own retriever or RAG function and test it explicitly.
  4. Inline keys + tracing
    • Don’t log secrets by accident if you enable tracing. Redact before you trace.

Appendix: Full minimal example (baseline → eval → optimize)

import dspy
from dspy import Signature, Predict
from dspy.evaluate import answer_exact_match
from dspy.teleprompt import MIPROv2

# 1) LM config
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# 2) Signature and baseline program
class AnswerQuestion(Signature):
    question: str
    answer: str

program = Predict(AnswerQuestion)

# 3) Dev set + Evaluate
devset = [
    dict(question="Capital of France?", answer="Paris"),
    dict(question="Who wrote Hamlet?", answer="Shakespeare"),
]

evaluate = dspy.Evaluate(devset=devset, metric=answer_exact_match)
print("Baseline:", evaluate(program))

# 4) Auto-optimize with MIPROv2
miprov2 = MIPROv2(metric=answer_exact_match, auto="medium")
optimized = miprov2.compile(student=program, trainset=devset)

# 5) Save and load
optimized.save("qa_optimized.json")
reloaded = dspy.load("qa_optimized.json")
print("After optimization:", evaluate(reloaded))

All APIs above reflect current docs for ReAct, Evaluate, metrics, and MIPROv2, and the save/load behavior shown in the optimizer/evaluation pages.

Final checklist so you ship confidently

Tega AdeyemiNovember 10, 2025.