Inference
The process of an AI model generating a response to your input, as opposed to the training phase when it originally learned.
What It Is
Inference is what happens every time you ask an AI model a question and it produces an answer. It is the “using” phase, as opposed to the “learning” phase (training). During training, the model processed massive datasets over weeks or months to build its internal knowledge. During inference, it applies that learned knowledge to your specific prompt in real time. Every chat message, every API call, every automated workflow that touches an AI model is performing inference. The model’s weights are frozen at this point. It is not learning anything new, just applying what it already knows.
Why It Matters
Inference is where costs accumulate for operators. Training happens once (and is done by the model provider), but inference happens every single time someone uses the model. API pricing is based on inference: you pay per token processed. Understanding this distinction helps you think about costs, speed, and architecture. A model that is cheap to run inference on (like Claude Haiku or GPT-4o mini) might be the right choice for high-volume tasks, even if a larger model would give slightly better results.
In Practice
When you set up an n8n workflow that sends 500 emails through an AI model for personalization, that is 500 inference calls. Each one costs tokens. Operators who understand inference think about batching, caching, and model selection to keep costs under control while maintaining quality.