Composo - Composo

Overview

The Composo class provides a synchronous client for evaluating chat messages against custom criteria. Suitable for single evaluations or small batch scenarios with automatic retry mechanisms.

Constructor

from composo import Composo

client = Composo(
    api_key="your_api_key",
    base_url="https://platform.composo.ai",
    num_retries=1,
    model_core=None,
    timeout=60.0
)

Parameters

api_key

string

Your Composo API key for authentication. If not provided, will be loaded from the COMPOSO_API_KEY environment variable.

base_url

string

default:"https://platform.composo.ai"

API base URL. Change only if using a custom Composo deployment.

num_retries

integer

default:"1"

Number of retries on request failure. Each retry uses exponential backoff with jitter. Minimum value is 1 (retries cannot be disabled).

model_core

string

Optional model core identifier for specifying the evaluation model. If not provided, uses the default evaluation model.

timeout

float

default:"60.0"

Request timeout in seconds. Total time to wait for a single request (including retries).

Example

from composo import Composo

# Using API key directly
client = Composo(api_key="your_api_key_here")

# Using environment variable
import os
os.environ["COMPOSO_API_KEY"] = "your_api_key_here"
client = Composo()

# With custom configuration
client = Composo(
    api_key="your_api_key",
    num_retries=3,
    timeout=120.0
)

evaluate()

Evaluate messages against one or more evaluation criteria.

result = client.evaluate(
    messages=[...],
    criteria="Your evaluation criterion",
    system=None,
    tools=None,
    result=None,
    block=True
)

Parameters

messages

list[dict]

required

List of chat messages to evaluate. Each message should be a dictionary with role and content keys.Supported roles: system, user, assistant, toolExample:

[
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]

criteria

string | list[string]

Evaluation criterion or list of criteria. Can be a custom criterion string or use pre-built criteria from composo.criteria.Example:

"Reward helpful and accurate responses"
# or
["Criterion 1", "Criterion 2", "Criterion 3"]

system

string

Optional system message to set AI behavior and context for the evaluation.

tools

list[dict]

Optional list of tool definitions for evaluating tool calls. Each tool should follow the OpenAI function calling format.

result

dict

Optional LLM result to append to the conversation for evaluation.

block

boolean

default:"True"

If False, returns a dictionary with task_id instead of blocking for results. Use for async job submission.

Returns

result

EvaluationResponse | list[EvaluationResponse]

Returns single EvaluationResponse if one criterion provided
Returns list[EvaluationResponse] if multiple criteria provided
Returns dict with task_id if block=False

Response Schema

EvaluationResponse

score

float | null

Evaluation score between 0.0 and 1.0. Returns null if the criterion was deemed not applicable.

explanation

string

Detailed explanation of the evaluation score and reasoning.

Examples

Basic Evaluation

from composo import Composo

client = Composo()

messages = [
    {"role": "user", "content": "What's the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."}
]

result = client.evaluate(
    messages=messages,
    criteria="Reward accurate and informative responses"
)

print(f"Score: {result.score}")
# Output: Score: 0.95

print(f"Explanation: {result.explanation}")
# Output: Explanation: The response correctly identifies Paris as the capital of France...

Multiple Criteria Evaluation

results = client.evaluate(
    messages=[...],
    criteria=[
        "Reward accurate information",
        "Reward clear communication",
        "Penalize overly technical jargon"
    ]
)

for result in results:
    print(f"Score: {result.score} - {result.explanation}")

Tool Call Evaluation

messages = [
    {"role": "user", "content": "What's the weather in SF?"},
    {
        "role": "assistant",
        "content": None,
        "tool_calls": [{
            "id": "call_123",
            "type": "function",
            "function": {
                "name": "get_weather",
                "arguments": '{"location": "San Francisco"}'
            }
        }]
    },
    {
        "role": "tool",
        "tool_call_id": "call_123",
        "content": '{"temp": 65, "condition": "sunny"}'
    },
    {"role": "assistant", "content": "It's 65°F and sunny in San Francisco!"}
]

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
}]

result = client.evaluate(
    messages=messages,
    tools=tools,
    criteria="Reward correct tool usage and accurate responses"
)

Non-blocking Evaluation

# Submit evaluation without waiting
response = client.evaluate(
    messages=[...],
    criteria="Your criterion",
    block=False
)

task_id = response["task_id"]
print(f"Task submitted with ID: {task_id}")
# Use task_id to check status later

evaluate_trace()

Evaluate multi-agent traces with full conversation history across multiple agents.

result = client.evaluate_trace(
    trace=trace_object,
    criteria="Your evaluation criterion",
    model_core=None,
    block=True
)

Parameters

trace

MultiAgentTrace

required

Multi-agent trace object containing agent interactions, initial input, and final output.

criteria

string | list[string]

required

Evaluation criterion or list of criteria for trace evaluation.

model_core

ModelCore

Optional model core identifier for trace evaluation.

block

boolean

default:"True"

If False, returns a dictionary with task_id instead of blocking for results.

Returns

result

MultiAgentTraceResponse | list[MultiAgentTraceResponse]

Returns single MultiAgentTraceResponse if one criterion provided
Returns list[MultiAgentTraceResponse] if multiple criteria provided
Returns dict with task_id if block=False

Response Schema

MultiAgentTraceResponse

agent_scores

dict

Per-agent evaluation scores mapping agent IDs to their individual scores.

overall_score

float

Overall trace score aggregated across all agents.

explanation

string

Detailed explanation of the trace evaluation.

criterion

string

The criterion that was evaluated.

Example

from composo import Composo, ComposoTracer, Instruments, AgentTracer
from openai import OpenAI

# Initialize tracing
ComposoTracer.init(instruments=Instruments.OPENAI)
openai_client = OpenAI()
composo_client = Composo()

# Use AgentTracer context manager to capture trace
with AgentTracer(name="research_agent") as tracer:
    response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Research: quantum computing"}]
    )
    result = response.choices[0].message.content

    # Get the trace object
    trace = tracer.trace

# Evaluate the captured trace
evaluation = composo_client.evaluate_trace(
    trace=trace,
    criteria="Reward thorough research and accurate information"
)

print(f"Overall Score: {evaluation.overall_score}")
print(f"Explanation: {evaluation.explanation}")

Context Manager Usage

The Composo client supports context managers for automatic resource cleanup:

with Composo() as client:
    result = client.evaluate(
        messages=[...],
        criteria="Your criterion"
    )
    print(result.score)
# Client automatically closed

Client

Tracing

​Overview

​Constructor

​Parameters

​Example

​evaluate()

​Parameters

​Returns

​Response Schema

​Examples

​Basic Evaluation

​Multiple Criteria Evaluation

​Tool Call Evaluation

​Non-blocking Evaluation

​evaluate_trace()

​Parameters

​Returns

​Response Schema

​Example

​Context Manager Usage

Overview

Constructor

Parameters

Example

evaluate()

Parameters

Returns

Response Schema

Examples

Basic Evaluation

Multiple Criteria Evaluation

Tool Call Evaluation

Non-blocking Evaluation

evaluate_trace()

Parameters

Returns

Response Schema

Example

Context Manager Usage