Inference
The process of an AI model generating a response or prediction from input data.
Why it matters
Inference is where the cost happens. Understanding it helps you optimize: batch queries, cache results, choose the cheapest model.
In practice
We route inference strategically: FAQ matching avoids it entirely, Ollama handles simple tasks locally (free), Claude API is reserved for complex reasoning.
Related terms
LLM (Large Language Model)
A large language model like Claude, GPT, or Gemini. The "brain" that understands and generates language.
Latency
The time it takes for an agent to respond or act.
Cost Tracking
Monitoring every AI call: model used, tokens in/out, cost, cache status. Essential for profitable operations.
Ollama
A tool for running AI models locally. Free, private, fast.