Engineering5 min read

OpenTelemetry GenAI Semantic Conventions

Guide: Instrument LLMs with OpenTelemetry GenAI conventions—portable traces for chat, tools & RAG. Debug faster, swap vendors safely.

Tega Adeyemi
Tega Adeyemi
OpenTelemetry GenAI Semantic Conventions

Standardize traces for LLMs, tools, and RAG so observability survives model swaps, vendor changes, and “agent sprawl.”

We’re watching observability become a first-class feature of AI engineering. Not because dashboards are cool—but because LLM systems are inherently distributed:

If we instrument each piece with different naming conventions (or worse: vendor-specific schemas), we get “telemetry soup.” The OpenTelemetry GenAI semantic conventions give teams a common vocabulary for spans and attributes, so traces remain meaningful even when we swap models/providers or reorganize agent workflows. (And yes, your future self will thank you.)

OpenTelemetry’s GenAI semantic conventions define well-known operation names like chat, embeddings, execute_tool, invoke_agent, etc., and they’re currently marked Development stability—so you should expect iteration, but you can still implement them today with guardrails.

Semantic conventions vs “vendor observability”

Let’s compare the two camps:

1) Vendor/platform conventions

Pros:

Cons:

2) OpenTelemetry semantic conventions (the “portable standard”)

Pros:

Cons:

Also worth noting: there are adjacent/open efforts like OpenInference (popular in the LLM observability community) that define their own semantic attributes. You can map between schemas, but the strategic win is choosing a “source of truth” early and being consistent.

The GenAI span model in practice

OpenTelemetry’s GenAI conventions standardize things like:

That last bullet is the difference between “helpful traces” and “we accidentally logged customer secrets.”

A working implementation

Below is a practical implementation that:

  1. configures OpenTelemetry tracing,
  2. creates a GenAI span around an LLM request,
  3. works with OpenAI Responses API,
  4. optionally works with LM Studio as an OpenAI-compatible local server.

Step 0: Install dependencies

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc openai

Step 1: Configure OpenTelemetry + OTLP exporter

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({
    "service.name": "genai-demo",
    "service.version": "0.1.0",
})

provider = TracerProvider(resource=resource)
trace.set_tracer_provider(provider)

# Export to an OTLP endpoint (Collector / vendor / gateway)
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

tracer = trace.get_tracer("genai-demo")

Tip: In real deployments, prefer standard OpenTelemetry env vars (e.g., OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT) instead of hardcoding.

Step 2: Instrument an OpenAI Responses API call (correct API shape)

OpenAI’s Responses API uses client.responses.create(...).

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def generate_answer(user_text: str) -> str:
    model = "gpt-4.1-mini"  # pick what you actually use

    with tracer.start_as_current_span(f"chat {model}") as span:
        # GenAI semantic attributes (keep them small & safe)
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.provider.name", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.output.type", "text")

        # ⚠️ Avoid logging raw prompts by default (privacy/security)
        # If you must, consider redaction/truncation + explicit opt-in.

        resp = client.responses.create(
            model=model,
            input=user_text,
        )

        # Responses API returns content in a structured form; a common helper is output_text.
        # Keep your code aligned to the official docs for your SDK version.
        text = getattr(resp, "output_text", None)
        if text is None:
            # Fallback: handle structured output if output_text isn't available
            text = str(resp)

        return text

Step 3: Swap OpenAI for LM Studio locally (same instrumentation)

LM Studio’s API server is designed to be OpenAI-compatible and documents endpoints like /v1/chat/completions, /v1/embeddings, and /v1/responses.

from openai import OpenAI

# LM Studio default is often http://localhost:1234/v1
client = OpenAI(
    api_key="lm-studio",  # LM Studio typically doesn’t require a real key
    base_url="http://localhost:1234/v1",
)

def local_answer(user_text: str) -> str:
    model = "your-local-model-name"

    with tracer.start_as_current_span(f"chat {model}") as span:
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.provider.name", "openai")  # schema-wise it's OpenAI-compatible
        span.set_attribute("gen_ai.request.model", model)

        resp = client.responses.create(model=model, input=user_text)
        return getattr(resp, "output_text", str(resp))

Real-world implementation tips

Don’t record raw prompts by default

OpenTelemetry’s GenAI spec explicitly warns that some attributes may contain sensitive information and recommends opt-in + filtering/truncation approaches.
Practical pattern:

Use separate spans for the parts you’ll actually debug

A useful trace breakdown looks like:

Those operation names are standardized in the GenAI conventions list.

Keep attribute payloads small

Even if you can record large tool definitions or message arrays, the spec warns these can be large and shouldn’t be on by default.
Engineers love observability… right up until the collector bill arrives.

Quick comparisons you’ll get asked in leadership meetings

“How is this different from OpenInference?”

OpenInference defines an LLM-oriented attribute model used by parts of the community and tooling ecosystem. You can use it, but if your platform strategy is “OpenTelemetry everywhere,” the GenAI semantic conventions reduce fragmentation across services and languages.

“Does this lock us into OpenAI?”

No—gen_ai.provider.name includes many providers (OpenAI, Anthropic, Bedrock, Azure OpenAI, etc.), and the operation naming stays consistent across them.
That’s the entire point: swap providers without rewriting your observability story.

Key takeaways

Tega AdeyemiDecember 15, 2025.