In a paper published earlier this month, OpenAI researchers said they’d discovered why even the most advanced AI models still experience frequent “hallucinations,” where systems like ChatGPT assert false information with confidence.
They found that the assessment of large language model outputs, such as those powering ChatGPT, means they’re “optimized to excel in tests,” and that “guessing when uncertain enhances test results.”
In basic terms, AI creators incentivize guessing over admitting ignorance — which might work on exams but is dangerous in providing crucial advice on subjects like medicine or law.
While OpenAI suggested in a blog post that “a straightforward fix” exists — by adjusting evaluations to “penalize confident errors more than uncertainty and reward partial credit for appropriate uncertainty” — one expert warns this strategy could have severe business implications.
In an essay for The Conversation, University of Sheffield lecturer and AI optimization expert Wei Xing contended that the AI industry lacks economic incentives for these changes, as they might drastically raise costs.
Furthermore, if AI frequently acknowledges its uncertainty, it might deter users, who often prefer confident responses, even if incorrect.
Even if ChatGPT admits uncertainty 30 percent of the time, users might become frustrated and leave, Xing argued.
“Users used to receiving confident answers to any question would likely leave such systems quickly,” the researcher wrote.
While there are “established methods for measuring uncertainty,” AI models could need “much more computation than current methods,” he argued, “as they must analyze multiple responses and gauge confidence levels.”
“For a system handling millions of queries daily, this means much higher operational expenses,” Xing wrote.
Increasing expenses now could be disastrous. AI firms have heavily invested in scaling, focusing on boosting infrastructure for more power-hungry models. Yet, a return on investment seems many years, if not decades away. So far, tens of billions in capital expenses have eclipsed modest revenues.
In other words, raising already significant operational costs — while losing users — could be another major challenge for companies like OpenAI as they try to convince investors of a viable long-term business model.
Xing argued that the suggested fixes for hallucinations might work for “AI systems managing crucial business operations or economic infrastructure” as “the harm of hallucinations outweighs the cost of models judging their uncertainty.”
“However, consumer applications still dominate AI development priorities,” he added. “Users want systems that provide confident answers to any question.”
Getting a quicker uncertain answer is cheaper for companies, possibly discouraging a more cautious approach with fewer hallucinations.
How this unfolds in the long run is uncertain, especially as market dynamics change and companies find better ways to operate their AI models.
But one thing is likely to remain: guessing will always be a cheaper option.
“In short, the OpenAI paper inadvertently exposes an uncomfortable reality,” Xing concluded. “The business incentives in consumer AI development remain largely at odds with reducing hallucinations.”
“Until these incentives shift, hallucinations will continue,” he added.
More on hallucinations: OpenAI Realizes It Made a Terrible Mistake


