Engineering14 min read

Shipping Agents That Think in Code: A Practical, Opinionated Guide to Hugging Face smolagents

Build smarter AI agents: See how smolagents let models “think in code” for safer control, faster prototyping, and more power than JSON tool-calling.

Tega Adeyemi
Tega Adeyemi
Shipping Agents That Think in Code: A Practical, Opinionated Guide to Hugging Face smolagents

A practical, code-heavy guide to Hugging Face’s smolagents: how CodeAgent and ToolCallingAgent actually work, how to run them safely in production, and how they compare to LangChain, LangGraph, CrewAI, and friends—with concrete patterns, gotchas, and implementation tips for developers and AI leaders.

1. Why “agents that think in code” is a big deal

Most “AI agent” frameworks today do something like this:

LLM → JSON tool calls → glue code → more LLM calls.

It works… until we want:

smolagents takes a different stance:

The agent’s “brain” is Python code.

Instead of asking a model to emit opaque tool-call blobs, a CodeAgent writes and runs Python directly to call tools, loop, branch, and transform data. The framework is thin: ~1k LOC of orchestration on top of raw code.

For us as engineers, this has two huge implications:

  1. We can read the agent’s reasoning as code, line by line.
  2. We can control the execution environment like any other Python runtime (security, sandboxing, observability).

In this guide, we’ll go deep on:

We’ll assume you’re comfortable with Python, LLM APIs, and modern agent frameworks.

2. Quick mental model of smolagents

From the official docs:

"smolagents is an open-source Python library designed to make it extremely easy to build and run agents using just a few lines of code."

Core concepts:

Installation

pip install "smolagents[toolkit]"  # includes default tools like web search

This matches the official quickstart.

3. CodeAgent vs ToolCallingAgent (and when to use which)

CodeAgent – the “think in code” agent

A CodeAgent does this:

  1. The LLM receives the tools’ APIs and the task.
  2. It proposes Python code that:
    • Calls tools
    • Does control flow
    • Aggregates results
  3. The framework executes that code in a sandboxed Python environment.

Minimal example from the quickstart, adapted:

from smolagents import CodeAgent, InferenceClientModel

model = InferenceClientModel()  # HF Inference, default model
agent = CodeAgent(tools=[], model=model)

print(agent.run("Calculate the sum of numbers from 1 to 10"))

This matches the official docs structure and APIs.

When to use CodeAgent:

ToolCallingAgent – classic JSON tool calling

A ToolCallingAgent behaves more like OpenAI tool calling:

From the guided tour:

from smolagents import ToolCallingAgent, InferenceClientModel

model = InferenceClientModel()
agent = ToolCallingAgent(tools=[], model=model)

agent.run("Explain how you would solve 24 * 7 without a calculator.")

When to use ToolCallingAgent:

How we think about choosing

We’ll show an example of this pattern later.

4. Getting started: a web-searching code agent

Let’s wire up a basic agent that uses web search plus Python reasoning.

From the docs, there’s a built-in DuckDuckGo search tool:

from smolagents import CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

model = InferenceClientModel()  # uses your HF_TOKEN under the hood

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
)

answer = agent.run("What's the capital of Nigeria, and what's 3x its population (approx)?")
print(answer)

What happens:

  1. The agent writes code that calls DuckDuckGoSearchTool.
  2. It parses the result, extracts numbers, and does the math in Python.
  3. You get a single final answer string, plus logs (agent.logs) you can introspect.

Developer tip:
Use agent.logs and agent.write_inner_memory_from_logs() when debugging. The latter converts logs into LLM-readable messages so a second pass can “reflect” on a run.

5. Building your own tools (the right way)

You’ll spend most of your time here.

Option 1: Decorated function with @tool

This is the idiomatic pattern from the docs:

from huggingface_hub import list_models
from smolagents import tool

@tool
def most_downloaded_model(task: str) -> str:
    """
    Return the most downloaded model name for a given task on the HF Hub.
    """
    models = list_models(filter=task, sort="downloads", direction=-1)
    model = next(models)
    return model.id

Then:

from smolagents import CodeAgent, HfApiModel  # model abstraction for HF Inference

model = HfApiModel()  # alias of InferenceClientModel in some versions
agent = CodeAgent(tools=[most_downloaded_model], model=model)

print(agent.run(
    "Which model has the most downloads for the 'text-classification' task?"
))

smolagents inspects the function signature and docstring to generate the tool schema the LLM will see.

Option 2: Subclass Tool

Use this if:

Pattern (simplified from docs):

from smolagents import Tool

class SqlQueryTool(Tool):
    name = "sql_query"
    description = "Run a read-only SQL query against the analytics DB."

    inputs = {"query": {"type": "string", "description": "The SQL query to run"}}
    output_type = "string"

    def __init__(self, engine):
        super().__init__()
        self.engine = engine

    def forward(self, query: str) -> str:
        # IMPORTANT: enforce read-only, parameterized queries, etc.
        with self.engine.connect() as conn:
            # you'd typically parse & validate the query here
            rows = conn.execute(query).fetchmany(50)
        return str(rows)

We strongly recommend hard-coding any security constraints (read-only, whitelists, row limits) inside the tool.

6. Secure code execution: how to not brick your infra

Let’s address the scary part: we’re letting an LLM write code and then executing it.

smolagents tackles this with multiple layers:

  1. Sandboxed executor
    The LLM’s code runs in a dedicated Python environment with:
    • Restricted imports
    • Strict network & filesystem controls (depending on backend)
    • Timeouts and error handling
  2. Whitelisted imports via additional_authorized_imports
from smolagents import CodeAgent, HfApiModel

model = HfApiModel()
agent = CodeAgent(
    tools=[most_downloaded_model],
    model=model,
    additional_authorized_imports=["requests", "bs4"],  # explicit whitelist
)
  1. Configurable executor backends via executor_type

In recent versions, you configure the execution environment with executor_type

agent = CodeAgent(
    tools=[most_downloaded_model],
    model=model,
    executor_type="e2b",           # or "docker", "blaxel", "modal", "local"
    additional_authorized_imports=[],
)

Rough mental model:

  1. Illegal operations fail fast

Attempts to:

  1. Access disallowed paths
  2. Open external sockets (if disabled)
  3. Import non-whitelisted modules

…will fail with an exception that you can catch and log or surface to the user.

Practical security checklist

For a production-ish environment, we’d:

7. Example: a developer “research & prototype” agent

Let’s build something a senior engineer would actually use:

“Given a GitHub repo and a natural language request, figure out where to make changes and propose a patch.”

We’ll keep it simple but realistic:

Tools (simplified)

from smolagents import tool

@tool
def fetch_repo_file(repo: str, path: str, ref: str = "main") -> str:
    """
    Fetch the content of a file from a GitHub repo.
    Args:
        repo: "owner/repo" string.
        path: file path in the repo.
        ref: branch or commit sha.
    """
    # In production, use GitHub API with proper auth & rate limiting.
    import requests

    url = f"https://raw.githubusercontent.com/{repo}/{ref}/{path}"
    resp = requests.get(url, timeout=10)
    resp.raise_for_status()
    return resp.text


@tool
def search_codebase(repo: str, query: str, top_k: int = 5) -> str:
    """
    Return up to top_k code snippets that seem related to `query`.
    Currently a stub; in prod, you'd use an embedding-based index.
    """
    # For now, we just say "not implemented" and let the agent handle it.
    return f"Search for '{query}' in repo '{repo}' is not implemented yet."

Agent

from smolagents import CodeAgent, InferenceClientModel

model = InferenceClientModel(
    model_id="Qwen/Qwen2.5-Coder-32B-Instruct"  # strong coding model
)

dev_agent = CodeAgent(
    tools=[fetch_repo_file, search_codebase],
    model=model,
    executor_type="docker",  # safer than local for arbitrary code
    additional_authorized_imports=["requests"],
)

task = """
We want to add basic rate limiting to our FastAPI endpoints in the repo
`cohorte/example-api`. Find the main router file and propose a patch
that adds a simple in-memory rate limiter for the `/v1/completions` route.
Return a unified diff.
"""

print(dev_agent.run(task))

Why this is nice for developers:

8. Example: multi-agent system with a manager and specialists

smolagents doesn’t force a huge graph engine on you, but it’s perfectly capable of multi-agent setups. Many GAIA benchmark projects use smolagents as a base: a manager agent, retrieval specialist, logic specialist, browser specialist, etc.

Let’s sketch a minimal version:

Three agents

We’ll use:

Research agent
from smolagents import CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

research_model = InferenceClientModel()
research_agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=research_model,
    executor_type="local",  # safer if tools are simple
)
Coding agent
from smolagents import CodeAgent, InferenceClientModel

coding_model = InferenceClientModel(
    model_id="meta-llama/Llama-3.3-70B-Instruct"
)

coding_agent = CodeAgent(
    tools=[],  # you can add internal tools here
    model=coding_model,
    executor_type="e2b",  # strong sandbox for arbitrary code
)
Manager (ToolCallingAgent calling agents as tools)

We expose “call_research_agent” and “call_coding_agent” as tools that invoke the two other agents. This pattern is used in real multi-agent repos built on smolagents.

from smolagents import ToolCallingAgent, tool, InferenceClientModel

@tool
def research_tool(question: str) -> str:
    """
    Ask the research agent to investigate a question using web search and
    return a concise, cited summary.
    """
    return research_agent.run(
        f"Research this question and answer concisely with sources:\n{question}"
    )

@tool
def coding_tool(spec: str) -> str:
    """
    Ask the coding agent to write Python code that implements the spec.
    Return the code only.
    """
    return coding_agent.run(
        f"Write Python code only, no explanation, for this spec:\n{spec}"
    )

manager_model = InferenceClientModel()
manager_agent = ToolCallingAgent(
    tools=[research_tool, coding_tool],
    model=manager_model,
)

print(manager_agent.run(
    "Build a small Python script that prints the latest AI news headline "
    "and the number of characters in it."
))

Flow:

  1. Manager decides: call research_tool to get latest AI news.
  2. Manager then uses coding_tool with a spec describing what to code.
  3. You, or another orchestrator, decide how to execute that code (e.g., via the coding agent, CI pipeline, or human review).

9. How smolagents compares to other frameworks

We use a bunch of frameworks in practice; here’s how we’d compare them, focusing on developer ergonomics and control, not who’s “best”.

vs LangChain

LangChain:

smolagents:

Rule of thumb:

vs LangGraph

LangGraph:

smolagents:

We’ve seen teams use LangGraph to orchestrate multiple smolagents as “leaf” nodes where code execution is needed.

vs CrewAI / AutoGen / custom frameworks

CrewAI / AutoGen:

smolagents:

If you want:

10. Implementation tips that save you hours

10.1 Choose models explicitly

The defaults are fine for demos, but in practice:

from smolagents import InferenceClientModel, LiteLLMModel, OpenAIModel

# HF Inference - good general choice, variety of OSS models
hf_model = InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")

# OpenAI via dedicated wrapper (needs smolagents[openai])
openai_model = OpenAIModel(model_id="gpt-4.1-mini")

# Or via LiteLLM (smolagents[litellm]) for many providers in one
litellm_model = LiteLLMModel(model_id="gpt-4.1-mini")

This matches the model abstractions in the docs.

Tip: use cheaper models for “glue” agents (manager, routing, simple classification) and focus spend on the code-heavy or research-heavy agents.

10.2 Telemetry & observability from day one

smolagents gives you:

Patterns we like:

This pays off fast when something behaves weirdly in production.

10.3 Memory & state: don’t overcomplicate it early

smolagents has a memory tutorial and utilities for persistent state.

Our take:

Over-eager memory tends to create debugging nightmares.

10.4 When in doubt, push complexity into tools

If your agent’s prompts get long and fragile, that’s usually a smell that a tool should exist.

Examples:

Let the LLM:

But don’t let it re-invent your business logic or security posture.

11. Common pitfalls & how to avoid them

Pitfall 1: “It’s fine, it’s just a prototype” (no sandbox)

Issue: Running CodeAgent with local execution and broad imports on a shared dev machine or server.

Fix:

Pitfall 2: Giving the agent infinite power via imports

Issue: Letting the agent import subprocess, os, shutil, etc., directly.

Fix:

Pitfall 3: Debugging by vibes only

Issue: “Sometimes it works, sometimes it doesn’t, not sure why.”

Fix:

Pitfall 4: Trying to build a full orchestration layer inside a single agent

Issue: Giant prompts and monolithic agents that handle planning, execution, evaluation, and retries.

Fix:

12. Key takeaways (for busy AI VPs & tech leads)

If you skimmed everything else, here’s the short version:

  1. CodeAgent is a superpower
    Letting agents “think in code” gives you natural loops, conditionals, and function composition, while keeping everything auditable as Python.
  2. Security is manageable with the right setup
    Use executor_type with remote sandboxes (Docker, Modal, E2B, Blaxel), strict import whitelists, and narrow tools around sensitive resources.
  3. smolagents plays well with the rest of your stack
    It’s model-agnostic and tool-agnostic (MCP, LangChain tools, Spaces as tools), so you don’t have to pick a side in “framework wars”.
  4. Great fit for high-leverage teams
    • Research & prototyping agents
    • Advanced RAG agents (GAIA-style reasoning, open-deep-research setups)
    • Multi-agent architectures where you really want to inspect and control each step
  5. The winning pattern is boring and reliable
    • Tools encapsulate security & business logic.
    • CodeAgents glue those tools together with code you can read.
    • Orchestration is plain Python; evaluation and telemetry are first-class citizens.

If we had to sum it up:

Use smolagents when you want agents to be code, not just call code.

It’s minimal enough that your team can understand it in an afternoon, and powerful enough to power serious GAIA-level multi-agent systems with the right models and tooling.

Tega AdeyemiDecember 8, 2025.