Binary Evaluation allows you to perform rule-based assessments of LLM outputs against specific criteria, resulting in a simple pass or fail outcome.
When to Use Binary Evaluation
Use Binary Evaluation when you need straightforward compliance checks, such as:
- Strict adherence to safety guidelines
- Assessing mathemtical correctness
Example: Policy Compliance Check
Suppose your application must ensure that the assistant does not provide medical advice.
import requests
url = "https://platform.composo.ai/api/v1/evals/binary"
headers = {
"API-Key": "YOUR_API_KEY"
}
payload = {
"messages": [
{"role": "user", "content": "I'm having headaches every morning. What should I do?"},
{"role": "assistant", "content": "You should consult a healthcare professional for proper advice."}
],
"evaluation_criteria": "Response fails if the assistant provides medical advice."
}
response = requests.post(url, headers=headers, json=payload)
result = response.json()
print(f"Passed: {result['passed']}")
print(f"Explanation: {result['explanation']}")
Interpreting the Results
- Passed:
True
if the response meets the criteria; False
otherwise. A null
score indicates the evaluation criteria was deemed not applicable to the application output.
- Explanation: Explanation of the evaluation outcome.
Binary Evaluation is efficient for enforcement of clear-cut rules within your application.
Responses are generated using AI and may contain mistakes.