Researchers tested ChatGPT's ability to triage medical symptoms and the results are raising serious questions about how people use AI for health advice

Tension: Millions of people are using ChatGPT as their first point of contact for medical symptoms, but a rigorous study found the AI correctly triaged urgency only about 56% of the time — performing worst on the emergencies where accuracy matters most.
Noise: ChatGPT’s polished, authoritative tone triggers fluency bias, making users trust its medical responses far more than the underlying accuracy warrants — especially among underinsured and rural populations who rely on it most.
Direct Message: The most dangerous health advice isn’t the kind that sounds wrong — it’s the kind that sounds so right you never think to question it, delivered by a system that has no way of knowing what you forgot to mention.

To learn more about our editorial approach, explore The Direct Message methodology.

Last March, Denise Kowalski — a 51-year-old middle school teacher in Columbus, Ohio — woke up at 2 a.m. with a squeezing sensation in her chest. Not sharp, not stabbing. More like someone had placed a warm hand firmly on her sternum and was pressing down. She didn’t call 911. She didn’t wake her husband. She opened ChatGPT on her phone and typed: chest pressure, female, 51, slightly nauseous, what could this be?

The chatbot responded with a list of possibilities — acid reflux, anxiety, musculoskeletal strain, and yes, cardiac events — but framed the most dangerous option as one of many equally weighted possibilities. It suggested she monitor her symptoms. Denise went back to sleep. Three days later, after the pressure returned during a brisk walk, her doctor ran an EKG and found evidence of a minor cardiac event — one that had likely occurred that night.

She was lucky. But the question her story raises isn’t really about luck.

A study published in early 2025 in BMJ Quality & Safety put ChatGPT through a rigorous test: researchers fed it 45 clinical vignettes — standardized patient scenarios used to train medical students — and asked it to triage the urgency of each case. The AI correctly identified the appropriate triage level only about 56 percent of the time. For emergency cases — the ones where getting it wrong could mean death — it performed significantly worse, frequently under-triaging urgent symptoms as routine or non-urgent.

Fifty-six percent. A coin flip with slightly better odds.

And yet, as a recent piece explored on this site, millions of people are now using AI chatbots as their first point of contact for health concerns. Not their second opinion — their first. The gap between what ChatGPT sounds like and what it actually knows has become one of the most consequential misunderstandings of the AI era.

Photo by Sanket Mishra on Pexels

There’s a psychological concept called fluency bias — the tendency to trust information more when it’s delivered smoothly, confidently, and in well-structured language. ChatGPT is, by design, a fluency machine. It doesn’t hedge. It doesn’t stammer. It doesn’t say “hmm, I’m not sure, let me think about this.” It produces polished paragraphs that read like they were written by a calm, knowledgeable physician with excellent bedside manner. And that calm is precisely the problem.

Rafael Gutierrez, a 34-year-old software developer in Austin, started using ChatGPT for health questions after his insurance deductible doubled last year. “I wasn’t trying to replace a doctor,” he told me. “I just wanted to know if something was worth going in for.” He asked about persistent headaches that came with visual disturbances. ChatGPT mentioned migraines, tension headaches, screen fatigue. It also mentioned — in passing, buried in a paragraph — that visual disturbances could indicate something neurological. Rafael focused on the migraine explanation because it came first and sounded right. He bought blue-light glasses. Four months later, an optometrist discovered early-stage papilledema — swelling of the optic nerve that can signal dangerously elevated intracranial pressure.

The AI technically mentioned the correct concern. But “technically mentioned” and “effectively communicated the risk” are wildly different things. A human doctor seeing Rafael in person — watching him squint, noting how he described the visual disturbances, picking up on the pattern of worsening — would have weighted the urgency entirely differently. ChatGPT has no body to read. No intuition trained on thousands of real patient encounters. No gut feeling.

What it has is pattern completion. And pattern completion without clinical judgment is a dangerous thing to mistake for medicine.

The BMJ study’s findings are especially alarming when you consider who’s most likely to rely on AI for health guidance: people without easy access to healthcare. A 2024 survey published in JAMA Network Open found that individuals with lower incomes, limited insurance, and those living in rural areas were disproportionately likely to use AI chatbots for symptom checking. These are the people who can least afford a wrong answer — and they’re the ones most likely to get one.

This creates something researchers are starting to call a diagnostic confidence gap — the distance between how certain someone feels about a health decision and how much that certainty is actually warranted. ChatGPT doesn’t just provide information; it provides information wrapped in the affect of authority. And for someone who’s already anxious, already hesitant to spend money on a doctor’s visit, that authority can be the thing that convinces them to stay home.

Consider what we’ve already seen with self-directed health decisions in other domains. As one account detailed on this site, years of stacking supplements without medical guidance produced bloodwork results that were actively worse than baseline. The impulse is the same: we trust accessible information over professional guidance because accessible information feels like empowerment. And sometimes it is. But the line between empowered and endangered is thinner than most people realize.

Photo by RDNE Stock project on Pexels

Nora Yuen, a 63-year-old retired nurse in Portland, Oregon, has watched this shift with a kind of quiet horror. She spent 30 years in emergency medicine. She knows what missed triage looks like. “The thing people don’t understand,” she said, “is that triage isn’t just about matching symptoms to a list. It’s about what’s not on the list. It’s about the thing the patient doesn’t think to mention because they don’t know it matters.” A slight change in skin color. Breathing that’s just a beat too fast. The way someone holds their arm.

Nora’s point cuts to something fundamental about what AI can and can’t do in healthcare — at least right now. ChatGPT processes text. It processes the words you choose, the way you frame your question, the details you think to include. But illness doesn’t always announce itself in the words we know to use. Sometimes the most important symptom is the one we dismiss, minimize, or simply don’t have the vocabulary for. A human clinician can catch what you didn’t say. An AI chatbot can only work with what you typed.

And the cultural momentum is only accelerating. Research increasingly shows how deeply our early experiences shape our bodies and brains in ways we’re only beginning to understand — which means the health questions we need answered are becoming more complex, more personal, more layered. Exactly the kind of questions that demand nuanced, individualized judgment rather than pattern-matched responses.

None of this means AI has no role in healthcare. It absolutely does — in research synthesis, in administrative efficiency, in helping physicians process enormous datasets. Some early AI diagnostic tools, built specifically for clinical environments with proper guardrails, are showing real promise. But there is a canyon-wide difference between AI designed for doctors and AI designed for conversation that people happen to use for doctoring.

ChatGPT is the latter. It was not built to keep you alive. It was built to be helpful, and it often is. But helpful and safe are not always the same thing, and in healthcare, the cost of confusing the two is measured in outcomes no chatbot will ever have to live with.

Denise Kowalski keeps her ChatGPT conversation from that night saved on her phone. She looks at it sometimes. “It reads like perfectly reasonable advice,” she said. “That’s what scares me. It sounded exactly like what a calm, smart person would tell you. And it was wrong in the one way that mattered.”

The most dangerous medical advice isn’t the kind that sounds wrong. It’s the kind that sounds so right you don’t question it — delivered by a voice so steady you forget it doesn’t actually know you’re in the room.

Feature image by Maksim Goncharenok on Pexels

Researchers tested ChatGPT’s ability to triage medical symptoms and the results are raising serious questions about how people use AI for health advice

Direct Message News

MOST RECENT ARTICLES

Why the hardest part of changing isn’t finding something better — it’s letting go of what used to work

The best decisions aren’t made by weighing pros and cons — they’re made by asking what kind of person you want to be, and most decision-making advice has been pointing you in the wrong direction

People consistently choose video over text when learning about something new. Ads and emails never got the memo.

The brutal cost of letting AI do all the writing: your voice, your distinctiveness, your audience’s trust

Google’s algorithm now penalizes the exact content strategy most brands just finished building

Brands think they understand young people. Young people think that’s hilarious.