Just over a month has passed since OpenAI launched its much-anticipated GPT-5 large language model (LLM), and it has continuously produced an astonishing volume of strange inaccuracies. AI experts at the Discovery Institute’s Walter Bradley Center for Artificial Intelligence, unhappy Reddit users on r/ChatGPTPro, and even OpenAI CEO Sam Altman himself provide ample evidence suggesting OpenAI’s claim about GPT-5 having “PhD-level intelligence” requires significant caveats.
In a Reddit post, a user discovered that GPT-5 was generating “wrong information on basic facts over half the time,” and without fact-checking, other inaccuracies might have gone unnoticed. This user’s experience underscores how frequently chatbots hallucinate, which means confidently fabricating things. Although not exclusive to ChatGPT, OpenAI’s latest LLM seems especially prone to inaccuracies, challenging the company’s assertion that GPT-5 hallucinates less than its predecessors.
In a recent blog post about hallucinations, where OpenAI once more claimed GPT-5 produces “significantly fewer” hallucinations, they sought to elucidate how and why these inaccuracies emerge. “Hallucinations persist partly because current evaluation methods set the wrong incentives,” the September 5 blog states. “While evaluations themselves do not directly cause hallucinations, most evaluations measure model performance in a way that encourages guessing rather than honesty about uncertainty.”
This means LLMs hallucinate because they train to produce correct answers, even if it involves guessing. Some models, like Anthropic’s Claude, have been trained to admit when they don’t know something, whereas OpenAI’s have not, leading to incorrect guesses. The Reddit user noted massive factual errors when inquiring about various countries’ GDP, resulting in figures “literally double the actual values.” For instance, Poland was listed as having a GDP surpassing two trillion dollars, while the IMF records it at around $979 billion. This error perhaps stems from the Polish president’s recent claims about the economy, not GDP, exceeding $1 trillion.
“The scary part? I only noticed these errors because some answers seemed so off that they made me suspicious,” the user mentioned. “For instance, when I saw GDP numbers that seemed way too high, I double-checked and found they were completely wrong.” They pondered, “How many times do I NOT fact-check and just accept the wrong information as truth?”
Meanwhile, AI skeptic Gary Smith from the Walter Bradley Center mentioned conducting three simple experiments with GPT-5 since its launch—a modified tic-tac-toe game, queries about financial advice, and a request to draw a possum with five labeled parts—to show that GPT-5 was far from PhD-level expertise. The possum experiment was particularly egregious, technically providing correct names for the animal’s parts but incorrectly placing them, like calling a leg a nose and a tail the back left foot. When trying to replicate the experiment for a more recent post, Smith found that GPT-5 mislabeled parts weirdly when he made a typo—”posse” instead of “possum.”
Instead of a possum, the LLM generated its interpretation of a posse: five cowboys, some armed, with lines marking various parts. Some parts—the head, foot, and maybe ear—were correct, while the shoulder indicated one cowboy’s hat, and “fand,” possibly a blend of foot and hand, pointed at a shin.
We conducted a similar test, asking GPT-5 for an image of “a posse with six body parts labeled.” After clarifying that Futurism wanted a labeled image, not a text description, ChatGPT worked on it—what it produced was, as shown below, even more laughably incorrect than Smith’s results.
Clearly, GPT-5 is nowhere near as intelligent as a doctoral candidate—nor one capable of actually earning a PhD. The takeaway seems to be to fact-check whatever a chatbot delivers—or avoid using AI and do your research.
More on GPT-5: After Disastrous GPT-5, Sam Altman Pivots to Hyping Up GPT-6


