Overview
TheComposo class provides a synchronous client for evaluating chat messages against custom criteria. Suitable for single evaluations or small batch scenarios with automatic retry mechanisms.
Constructor
Parameters
Your Composo API key for authentication. If not provided, will be loaded from the
COMPOSO_API_KEY environment variable.API base URL. Change only if using a custom Composo deployment.
Number of retries on request failure. Each retry uses exponential backoff with jitter. Minimum value is 1 (retries cannot be disabled).
Optional model core identifier for specifying the evaluation model. If not provided, uses the default evaluation model.
Request timeout in seconds. Total time to wait for a single request (including retries).
Example
evaluate()
Evaluate messages against one or more evaluation criteria.Parameters
List of chat messages to evaluate. Each message should be a dictionary with
role and content keys.Supported roles: system, user, assistant, toolExample:Evaluation criterion or list of criteria. Can be a custom criterion string or use pre-built criteria from
composo.criteria.Example:Optional system message to set AI behavior and context for the evaluation.
Optional list of tool definitions for evaluating tool calls. Each tool should follow the OpenAI function calling format.
Optional LLM result to append to the conversation for evaluation.
If
False, returns a dictionary with task_id instead of blocking for results. Use for async job submission.Returns
- Returns single
EvaluationResponseif one criterion provided - Returns
list[EvaluationResponse]if multiple criteria provided - Returns
dictwithtask_idifblock=False
Response Schema
EvaluationResponseEvaluation score between 0.0 and 1.0. Returns
null if the criterion was deemed not applicable.Detailed explanation of the evaluation score and reasoning.
Examples
Basic Evaluation
Multiple Criteria Evaluation
Tool Call Evaluation
Non-blocking Evaluation
evaluate_trace()
Evaluate multi-agent traces with full conversation history across multiple agents.Parameters
Multi-agent trace object containing agent interactions, initial input, and final output.
Evaluation criterion or list of criteria for trace evaluation.
Optional model core identifier for trace evaluation.
If
False, returns a dictionary with task_id instead of blocking for results.Returns
- Returns single
MultiAgentTraceResponseif one criterion provided - Returns
list[MultiAgentTraceResponse]if multiple criteria provided - Returns
dictwithtask_idifblock=False
Response Schema
MultiAgentTraceResponsePer-agent evaluation scores mapping agent IDs to their individual scores.
Overall trace score aggregated across all agents.
Detailed explanation of the trace evaluation.
The criterion that was evaluated.
Example
Context Manager Usage
TheComposo client supports context managers for automatic resource cleanup: