Agent Correctness: Evaluating Tool Use Errors and Hallucinations
Text hallucinations get all the attention in LLM evaluation. But the more expensive failure mode in production agents is tool use: calling the wrong endpoints, inventing parameters, and executing...










