Knowledge Distillation

Compressing a larger model's behavior into a smaller model to reduce cost and latency.

Why it matters

Large models are smart but expensive. Knowledge distillation creates smaller models that capture most capability at a fraction of the cost.

In practice

Our Ollama-first approach is similar in spirit: use smaller local models for routine tasks and only escalate to Claude when needed.

Related terms

A tool for running AI models locally. Free, private, fast.

Reducing a model's numerical precision to decrease size, cost, and inference time.

The process of an AI model generating a response or prediction from input data.

A large language model like Claude, GPT, or Gemini. The "brain" that understands and generates language.