LLM-as-Judge

Using a second AI model to evaluate the quality of a primary agent's output.

Why it matters

An LLM-as-Judge can automatically evaluate thousands of agent outputs for accuracy and safety.

In practice

Our QA Judge subagent validates each feature against boolean pass/fail criteria in the PRD.

Related terms

Back to glossary