— Concepts
AI Evaluation (Eval)
The discipline of measuring whether an AI system is actually doing the job correctly.
Also known as: LLM eval · Model evaluation · AI testing
What is AI Evaluation (Eval)?
AI evaluation — usually shortened to 'evals' — is how you measure LLM output quality on a defined task. Evals can be deterministic (regex match, exact answer) or LLM-judged (one model grades another). Without evals you cannot tell whether a prompt change made things better or worse, whether a new model upgrade is safe to ship, or whether your agent is regressing in production. Frameworks like Anthropic Evals, OpenAI Evals, Promptfoo, and Inspect AI are standard in 2026.
— Related
Terms connected to AI Evaluation (Eval)
Models
LLM (Large Language Model)
An AI model trained on huge amounts of text that can understand and generate human language.
Open →Techniques
Fine-Tuning
Adjusting a pre-trained AI model on your specific data to change its behaviour.
Open →Concepts
AI Agent
An AI system that decides its own next action and takes multi-step actions autonomously.
Open →Concepts
Applied AI
The practical use of AI tools to ship products and outcomes.
Open →From definitions to deployed projects.
Knowing what a term means is step one. ONROL's AI Generalist track gets you shipping projects that use it.
