LLM-as-Judge
Using a second AI model to evaluate the quality of a primary agent's output.
Why it matters
An LLM-as-Judge can automatically evaluate thousands of agent outputs for accuracy and safety.
In practice
Our QA Judge subagent validates each feature against boolean pass/fail criteria in the PRD.
Related terms
Evaluations (Evals)
Systematic testing of agent performance: accuracy, safety, reliability.
Guardrails
Rules and constraints that prevent an agent from taking harmful or unauthorized actions.
Agent Team
A group of specialized agents that communicate directly with each other and divide work collaboratively.