

Agent Correctness: Evaluating Tool Use Errors and Hallucinations
Text hallucinations get all the attention in LLM evaluation. But the more expensive failure mode in production agents is tool use: calling the wrong endpoints, inventing parameters, and executing valid actions that solve the wrong problem. Here is how to measure and reduce agent correctness.