Skip to main content

Introduction

Composo’s tracing SDK enables you to capture and evaluate LLM calls from your agent applications in real-time. Currently supporting DIY agents built on OpenAI, Anthropic, and Google GenAI - with support for LangChain/LangGraph and other SDKs to come.

Why Tracing Matters

Many agent frameworks abstract away the underlying LLM calls, making it difficult to understand what’s happening under the hood and evaluate performance effectively. Many evaluation platforms only let you send traces to a remote system and wait to view results later. Composo gives you the best of both worlds: trace and evaluate immediately, or view your traces in our platform or any of your own observability tooling, spreadsheets or CI/CD seamlessly. By instrumenting your LLM calls and marking agent boundaries, you can evaluate performance in real-time and take action right away - allowing adjustment and feedback in real time before it gets seen by your users.

Key Features

  • Mark Agent Boundaries: Use AgentTracer context manager or @agent_tracer decorator to define which LLM calls belong to which agent
  • Hierarchical Tracing: Support for nested agents to model complex multi-agent architectures
  • Independent Evaluation: Each agent’s performance is evaluated separately with average, min, max and standard-deviation statistics reported per agent
  • Flexible Evaluation: Get evaluation results instantly in your code, or view traces in the Composo platform for deeper analysis (or through seamless sync with any observability platform like Grafana, Sentry, Langfuse, LangSmith, Braintrust)

Framework Support

  • Currently Supported:
    • Agents built on OpenAI LLMs
    • Agents built on Anthropic LLMs
    • Agents built on Google GenAI LLMs
  • Coming Soon: Langchain, OpenAI Agents, and other popular frameworks

Quickstart

This guide walks you through adding tracing to your agent application in 3 steps. We’ll start with a simple multi-agent application and add tracing incrementally.

Starting Code

Here’s a simple multi-agent application we want to trace:
from openai import OpenAI

open_ai_client = OpenAI()

def agent_2():
    return open_ai_client.chat.completions.create(
        model="gpt-4o-mini",
        max_tokens=5,
        messages=[{"role": "user", "content": "B"}],
    )

# Orchestrator agent
response1 = open_ai_client.chat.completions.create(
    model="gpt-4o-mini",
    max_tokens=5,
    messages=[{"role": "user", "content": "A"}],
)

response2 = agent_2()

Step 1: Install and Initialize

Install the Composo SDK and initialize tracing for your LLM provider (OpenAI or Anthropic).
pip install composo
Add these imports and initialization:
# Add these imports at the top
from composo.tracing import ComposoTracer, Instruments, AgentTracer, agent_tracer
from composo.models import criteria
from composo import Composo

# Initialize tracing and Composo client (add after imports)
ComposoTracer.init(instruments=[Instruments.OPENAI])
composo_client = Composo(
    api_key="your_composo_key"
)

Step 2: Mark Your Agent Boundaries

Wrap your agent logic with AgentTracer or @agent_tracer to mark boundaries. For the function-based agent, add the decorator:
# Add decorator to agent_2
@agent_tracer(name="agent2")
def agent_2():
    return open_ai_client.chat.completions.create(
        model="gpt-4o-mini",
        max_tokens=5,
        messages=[{"role": "user", "content": "B"}],
    )
For the orchestrator, wrap with AgentTracer context manager:
# Wrap orchestrator logic
with AgentTracer("orchestrator") as tracer:
    with AgentTracer("agent1"):
        response1 = open_ai_client.chat.completions.create(
            model="gpt-4o-mini",
            max_tokens=5,
            messages=[{"role": "user", "content": "A"}],
        )
    response2 = agent_2()
Note: tracer object from the root AgentTracer is needed for evaluation in Step 3.

Step 3: Evaluate Your Trace

Add evaluation after your agents complete:
# Evaluate the trace (add after agent execution)
for result, criterion in zip(
    composo_client.evaluate_trace(tracer.trace, criteria=criteria.agent),
    criteria.agent
):
    print("Criteria:", criterion)
    print(f"Evaluation Result: {result}\n")
Here, we are running the Composo agent evaluation framework with criteria.agent, but you can use any criterion here, as shown in the Agent evaluation section of our docs here. As long as you start your criteria with ‘Reward agents’ it’ll work.

Complete Example

from composo.tracing import ComposoTracer, Instruments, AgentTracer, agent_tracer
from composo.models import criteria
from composo import Composo
from openai import OpenAI

# Instrument OpenAI
ComposoTracer.init(instruments=[Instruments.OPENAI])
composo_client = Composo(
    api_key="your_composo_key"
)
open_ai_client = OpenAI()

# agent_tracer decorator marks any LLM calls inside as belonging to agent2
@agent_tracer(name="agent2")
def agent_2():
    return open_ai_client.chat.completions.create(
        model="gpt-4o-mini",
        max_tokens=5,
        messages=[{"role": "user", "content": "B"}],
    )

# AgentTracer context manager marks any LLM calls inside as belonging to orchestrator
# Has the added benefit of returning a tracer object that can be used for evaluation!
with AgentTracer("orchestrator") as tracer:
    with AgentTracer("agent1"):
        response1 = open_ai_client.chat.completions.create(
            model="gpt-4o-mini",
            max_tokens=5,
            messages=[{"role": "user", "content": "A"}],
        )
    response2 = agent_2()

for result, criterion in zip(
    composo_client.evaluate_trace(tracer.trace, criteria=criteria.agent),
    criteria.agent
):
    print("Criteria:", criterion)
    print(f"Evaluation Result: {result}\n")
You can also instrument multiple providers simultaneously:
ComposoTracer.init(instruments=[Instruments.OPENAI, Instruments.ANTHROPIC, Instruments.GOOGLE_GENAI])

Next Steps