Data Science

Your AI Isn’t Hallucinating, It’s Confabulating.

AI doesn't think like a computer. It thinks like a human.

Published:

17.08.25

Picture this: you ask your company's new AI assistant to research a competitor's latest product launch. Within seconds, it delivers a comprehensive analysis complete with launch dates, pricing details, and market positioning. The analysis looks professional, sounds authoritative, and contains one glaring problem. Half of it never happened.

Welcome to the strange world of AI hallucinations, where your digital assistant confidently invents facts with the same conviction it presents real ones. These aren't random glitches or temporary bugs waiting for a software update. They're fundamental to how these systems work, and they're costing businesses serious money whilst confusing executives who thought they were buying computer-like precision.

Here's what every business leader needs to understand: AI doesn't think like a computer at all. It thinks like a very knowledgeable human with a serious case of false confidence syndrome.

The Nobel Winner Who Changed Everything

Geoffrey Hinton, the man who won a Nobel Prize for basically teaching computers to think, has spent his career watching AI evolve from party trick to business necessity. Now he's telling anyone who'll listen something that should fundamentally change how you think about AI reliability.

"Stop calling them hallucinations," Hinton insists. "Call them confabulations." The distinction matters enormously for business leaders trying to understand what they've actually bought.

Hallucinations suggest something's wrong with the system's perception. Confabulations mean the system is doing exactly what human brains do when they don't have perfect information. They fill in the gaps with plausible-sounding stories.

Hinton's favourite example comes from Watergate. John Dean testified under oath about detailed conversations in the Oval Office, describing precise exchanges between himself and President Nixon. When the tapes surfaced, they proved Dean had fabricated entire conversations. Not deliberately, mind you. His brain had done what brains do when reconstructing memories from fragments. It created plausible narratives that felt completely real.

Your AI assistant does the same thing. When it doesn't have direct access to information, it reconstructs what seems most likely based on patterns it learned during training. The result sounds authoritative because the system genuinely "believes" its reconstruction makes sense.

As Geoffrey Hinton has argued in public talks, predicting the next word requires representations that capture sentence‑level meaning. These systems don't store facts like filing cabinets. They compress knowledge into connection patterns, then regenerate information probabilistically. It's closer to how you remember your childhood than how Google searches its database.

The implications are staggering. Despite having fewer connections than a human brain, GPT-4 knows hundreds of times more than any person because it has what Hinton calls "a much better learning algorithm than us." But that same algorithm makes confident mistakes as naturally as humans misremember yesterday's meeting.

Here's what worries me most: executives are still thinking about AI like they think about Excel. You're not dealing with a sophisticated calculator. You're working with something closer to a brilliant colleague who occasionally states obvious nonsense with complete confidence.

The Uncomfortable Truth About Better Models

Here's something that will make your procurement team's eye twitch: the more sophisticated and expensive the AI model, the more likely it is to confidently tell you something completely wrong.

I've seen this firsthand in boardrooms across London and Manchester. Recent research from the companies building these systems reveals a paradox that executives find hard to swallow. Advanced models have dramatically reduced confabulation rates for grounded tasks like document summarisation, but they still fail on complex reasoning and can be confidently wrong when tackling multi-step problems.

Why does this happen? Think about it like hiring consultants. Give a junior analyst a complex strategic question and they might quickly admit they don't know. Hand the same question to a senior strategist and they'll construct an elaborate, logical-sounding analysis that could be completely wrong. The sophisticated model attempts more complex reasoning chains where errors compound exponentially.

The confidence problem runs deeper. AI systems are frequently overconfident, with calibration research showing that stated confidence often exceeds actual accuracy. Imagine if your finance director had this level of unreliable calibration when presenting quarterly forecasts.

Even state‑of‑the‑art models score highly on summarisation benchmarks, yet they still make mistakes in practice, especially when tasks require reasoning beyond simple summarisation. That might sound impressive until you realise your customer service team processes thousands of queries daily, many involving more complex reasoning than simple summarisation.

The business implication is brutal: upgrading to the most expensive, latest model won't necessarily reduce errors. For complex analytical work requiring multiple reasoning steps, you might actually get more confident nonsense.

When Confident Lies Cost Real Money

Let's be honest about what's really happening out there. The legal profession has become ground zero for AI disaster stories, and the penalties are real.

Dozens of cases across 2023 and 2024 have involved barristers submitting completely fabricated case citations to courts, with sanctions in multiple jurisdictions. These weren't typos or minor inaccuracies. The AI had invented entire court cases, complete with realistic-sounding names and plausible legal precedents that never existed.

The penalties are real. Fines range from £5,000 to £10,000 per incident, plus mandatory ethics training and potential disbarment. One solicitor discovered too late that every case citation in their brief was AI-generated fiction. The judge was not amused.

Healthcare presents even more sobering examples. Investigations and studies have documented material errors in AI medical transcription and summaries, with some showing error rates where a significant portion could affect patient care. In one study, AI medical summaries contained significant errors in nearly half the cases reviewed.

Nobody wants to talk about this, but customer service disasters have become viral entertainment whilst destroying brand value. Air Canada learned this the hard way when their chatbot invented a bereavement fare policy that didn't exist. A customer's grandmother had died, they asked about compassionate rates, and the bot confidently explained a discount programme that was pure fiction. The court ordered Air Canada to honour the fake policy, costing them hundreds of pounds.

DPD's delivery chatbot made headlines for all the wrong reasons when it started writing poems criticising the company before being hastily shut down. A Chevrolet dealership's AI agreed to sell a new SUV for £1 after some creative prompt manipulation. McDonald's terminated their £40 million partnership with IBM after their drive-through AI created viral videos of frustrated customers receiving bizarre orders.

Here's what I'm seeing in financial services: Major banks like JPMorgan Chase and Goldman Sachs initially restricted public ChatGPT access in 2023 while building internal AI tools with better controls. Even media outlets suffer credibility damage when AI-generated content goes wrong. Sports Illustrated was caught publishing AI-created articles with fake author profiles, whilst Bloomberg required dozens of corrections for AI-generated summaries.

The pattern is clear: AI's confident lies don't just cause embarrassment. They trigger legal costs, regulatory fines, and systematic damage to customer relationships.

What Actually Works (And What Doesn't)

The good news is that smart companies have figured out how to dramatically reduce AI confabulation rates. The bad news is there's no silver bullet. Success requires combining multiple approaches in ways that acknowledge AI's fundamental nature rather than fighting it.

Ground Your AI in Real Data

The most effective technique is something called Retrieval-Augmented Generation, though you don't need to remember the name. Think of it as giving your AI assistant a research department. Instead of relying purely on its training data, the system searches through your company's verified documents, databases, and knowledge bases before answering questions.

Medical organisations using this approach with verified databases have shown improved factual grounding. Enterprise knowledge systems also show dramatic improvements over traditional searches. The difference is night and day because the AI can point to specific sources rather than reconstructing information from memory.

Make AI Show Its Working

Simple prompt changes can cut error rates substantially. Instead of asking "What are our competitor's pricing strategies?" try "According to our market research database, what are our competitor's pricing strategies? Please cite specific sources for each claim."

Forcing the system to show its work and cite sources creates both better outputs and an audit trail. If something goes wrong, you can trace exactly where the information came from rather than trying to reverse-engineer AI reasoning.

Keep Humans in the Loop

Here's what actually works: the most successful implementations aren't fully automated. They use AI to do the heavy lifting whilst keeping humans involved for verification and judgement calls. Healthcare studies show that 80% of AI diagnostic errors get caught when clinicians review the outputs.

The key is knowing where to focus human attention. Automated systems can flag responses that seem uncertain or inconsistent, creating continuous learning loops that improve performance over time. This doesn't mean every output needs human review, but the system should know when to ask for help.

Use Multiple AI Systems

When three different AI models agree on something, accuracy jumps significantly. Organisations running parallel AI systems and comparing outputs can spot inconsistencies before they become expensive mistakes. This redundancy might seem costly until you compare it to a single £10,000 legal sanction or million-pound regulatory fine.

The most sophisticated companies don't just run multiple models; they assign different roles to different systems. One AI might generate draft content, another checks it for accuracy, and a third evaluates whether it meets company standards. It's like having multiple editors review important documents.

Setting Expectations That Actually Work

The companies getting real value from AI today aren't those waiting for perfect systems. They're the ones who've built robust processes assuming errors will happen and planning accordingly.

A smart law firm might use AI to draft initial contracts but requires solicitors to verify every legal citation. Healthcare organisations deploy diagnostic AI but maintain mandatory clinician review for anything beyond routine cases. Financial services companies use AI for market analysis but implement multiple validation steps before making trading decisions.

The mindset shift is crucial. Instead of viewing verification as overhead, treat it as essential infrastructure. Budget 15-25% of your AI project costs for validation processes. This isn't bureaucracy; it's the price of doing business with systems that think like brilliant, confident humans rather than precise computers.

Training becomes critical. Your team needs to understand not just how to use AI but how to recognise when it's likely wrong. Develop internal guidelines for spotting probable confabulations: vague sources, unusually specific statistics without attribution, claims that contradict known company policies.

Create escalation protocols that make sense. When should someone question an AI recommendation? What triggers human review? Who makes the final decision when AI and human judgement conflict? These questions need answers before you're under pressure to make quick decisions.

Most importantly, don't let perfect be the enemy of good. AI's confabulation tendency doesn't negate its value; it just means you need to work with it intelligently. The companies that crack this balance will capture transformative productivity gains whilst their competitors wrestle with perfectionist paralysis.

Learning from the Companies Getting It Right

The most instructive success stories come from organisations that faced confabulation crises early and built comprehensive solutions rather than abandoning AI altogether.

Specialist industrial compliance teams have transformed outcomes by training domain-specific AI on regulatory documents to support compliance work. Instead of replacing legal expertise, these implementations accelerate staff workflows and improve consistency, delivering both efficiency gains and better accuracy when paired with clear governance.

In healthcare, provider organisations have deployed virtual care navigators that automate routine patient interactions while using built‑in safeguards to escalate uncertain or higher‑risk cases to clinicians. This human‑in‑the‑loop design keeps routine queries efficient and routes complex medical issues to qualified professionals immediately.

Even creative industries are finding value in controlled confabulation. David Baker's laboratory used AI's tendency to "imagine" non-existent proteins to design millions of new molecules in groundbreaking research. Marketing teams use similar approaches, embracing AI's creative confabulations for brainstorming whilst maintaining strict fact-checking for final content.

The pattern across successful implementations is consistent: these companies view confabulation as a manageable operational risk rather than a deal-breaker. They've built systems that harness AI's creativity whilst protecting against its overconfidence.

The Path Forward

AI confabulation isn't a temporary problem awaiting a technical fix. It's fundamental to how these systems achieve their remarkable capabilities. Geoffrey Hinton's insight cuts to the heart of it: AI thinks like humans, not computers. This makes confident errors as natural as human misremembering.

Here's what you need to do right now:

Invest in grounding AI responses in verified data. This isn't optional infrastructure; it's the foundation of reliable AI deployment.

Implement human oversight for important decisions. Build this into your workflows from day one, not as an afterthought when something goes wrong.

Deploy multiple AI systems to cross-check critical outputs. The cost of redundancy is nothing compared to the cost of confident mistakes.

Budget 15-25% of AI project costs for validation processes. Companies that skip this step pay far more in corrections and crisis management.

Those who view confabulation as manageable operational risk rather than existential threat will capture AI's transformative value whilst others hesitate. The question isn't whether AI will make things up. The question is whether your organisation is ready when it does.

After all, humans have been confidently wrong about things for millennia, and we've built entire civilisations around that limitation.

Time to extend the same courtesy to our newfound artificial friends.

Tags:

#datascience

#decisionmaking

#hallucination

Join Our Mailing List

The Innovation Experts

GitHub

Our Offices

Manchester, UK

Tallinn, Estonia

Get in Touch

hello@bloch.ai

Terms & Conditions

The Innovation Experts

GitHub

Our Offices

Manchester, UK

Tallinn, Estonia

Get in Touch

hello@bloch.ai

Terms & Conditions

The Innovation Experts

GitHub

Our Offices

Manchester, UK

Tallinn, Estonia

Get in Touch

hello@bloch.ai

Terms & Conditions

Your AI Isn’t Hallucinating, It’s Confabulating.

The Nobel Winner Who Changed Everything

The Uncomfortable Truth About Better Models

When Confident Lies Cost Real Money

What Actually Works (And What Doesn't)

Ground Your AI in Real Data

Make AI Show Its Working

Keep Humans in the Loop

Use Multiple AI Systems

Setting Expectations That Actually Work

Learning from the Companies Getting It Right

The Path Forward

More from Our Experts

Why Better Prompts Beat Better Models: The Bloch AI Three Cs Framework

Your AI Can Reason (Sort of). Just Don't Trust Its Explanations.

The AI Security Brief Your Board Actually Needs to Read

Why Better Prompts Beat Better Models: The Bloch AI Three Cs Framework

Your AI Can Reason (Sort of). Just Don't Trust Its Explanations.

Why Better Prompts Beat Better Models: The Bloch AI Three Cs Framework

Your AI Can Reason (Sort of). Just Don't Trust Its Explanations.

Join Our Mailing List

Join Our Mailing List

Join Our Mailing List