Knowledge Distillation
Compressing a larger model's behavior into a smaller model to reduce cost and latency.
Why it matters
Large models are smart but expensive. Knowledge distillation creates smaller models that capture most capability at a fraction of the cost.
In practice
Our Ollama-first approach is similar in spirit: use smaller local models for routine tasks and only escalate to Claude when needed.
Related terms
Ollama
A tool for running AI models locally. Free, private, fast.
Quantization
Reducing a model's numerical precision to decrease size, cost, and inference time.
Inference
The process of an AI model generating a response or prediction from input data.
LLM (Large Language Model)
A large language model like Claude, GPT, or Gemini. The "brain" that understands and generates language.