Evaluations (Evals)

Systematic testing of agent performance: accuracy, safety, reliability.

Why it matters

You wouldn't deploy software without tests. Evals are the test suite for AI agents.

In practice

Our QA Judge subagent runs boolean pass/fail criteria against every story in our PRD — automated quality assurance.

Related terms

Back to glossary