A new study tested ChatGPT's ability to triage medical symptoms, and the results are raising serious questions about how millions now self-diagnose

Tension: Millions of Americans are now using ChatGPT to triage their medical symptoms in the middle of the night — and a major new study shows the AI gets it dangerously wrong in both directions, missing real emergencies and over-escalating minor ones.
Noise: The conversation fixates on whether AI is ‘smart enough’ for medicine, but the real driver is a healthcare access crisis so severe that a chatbot disclaimer doesn’t stand a chance against 26-day wait times and 2 a.m. fear.
Direct Message: We’re not choosing machines over doctors — we’re reaching for whatever reaches back in the growing silence between feeling scared and getting help, and AI’s calm reassurance is most dangerous precisely because it feels like care without carrying any of its weight.

To learn more about our editorial approach, explore The Direct Message methodology.

Last March, Denise Harmon — a 51-year-old paralegal in Tucson — woke up at 2 a.m. with a sharp, radiating pain beneath her left shoulder blade. Her husband was asleep. Her doctor’s office wouldn’t open for six hours. So she did what roughly 80 million Americans now do when something feels wrong in their body at an inconvenient hour: she opened ChatGPT and typed her symptoms.

The chatbot told her the pain was likely musculoskeletal — probably from sleeping in an awkward position. It suggested ibuprofen, gentle stretching, and a follow-up with her physician if it persisted. Denise took two Advil and went back to bed.

Four days later, she was in the ER with a pulmonary embolism.

“I’m not an idiot,” Denise told me over the phone. “I know ChatGPT isn’t a doctor. But at two in the morning, when you’re scared and alone with your phone, the thing that talks back to you like it knows what it’s talking about? That becomes your doctor.”

She’s not wrong. And a new study just proved exactly why that’s so dangerous — and so understandable at the same time.

Published in April 2025 in BMJ Quality & Safety, the study from researchers at UC San Francisco and Harvard tested ChatGPT-4 and other large language models on 200 clinical vignettes — standardized case descriptions used to train medical students in triage. Each vignette described a patient presenting with specific symptoms, and the AI was asked to do what an emergency department nurse or a primary care intake line would do: assess urgency. Decide whether the patient needs immediate emergency care, urgent same-day care, or can safely wait. The results were striking. ChatGPT correctly triaged emergencies about 60 percent of the time — which sounds passable until you realize that means it missed the urgency in four out of ten genuine emergencies. And in cases that were truly non-urgent, it over-triaged roughly half the time, telling patients to rush to the ER for things that could safely wait.

In other words, it was confidently wrong in both directions.

Photo by Los Muertos Crew on Pexels

The problem isn’t that people are gullible. The problem is that the interface is designed — by nature, if not by intent — to feel trustworthy. When ChatGPT responds to your symptoms, it doesn’t hedge the way a search engine does. It doesn’t show you ten blue links and let you spiral. It speaks to you in calm, organized paragraphs. It asks follow-up questions. It mirrors the cadence of a clinician so precisely that researchers have a term for what happens next: automation trust transfer — the cognitive leap where a user assigns a machine the authority of a human expert because it behaves like one.

And this is happening at a massive scale. A Pew Research survey from late 2024 found that nearly one in four U.S. adults had used an AI chatbot for health-related questions. Among adults under 35, that number jumped to almost 40 percent.

Rafael Medina, a 34-year-old graphic designer in Chicago, is one of them. He’d been experiencing intermittent chest tightness and fatigue for weeks. His primary care physician had a six-week waitlist. He told ChatGPT his symptoms, and the AI walked him through a differential that included anxiety, acid reflux, and deconditioning. It recommended breathing exercises and suggested he monitor his symptoms. “It felt like talking to a really smart friend who happened to know medicine,” Rafael said. When he finally saw his doctor — not six weeks later, but eight, because of a scheduling snafu — he was diagnosed with myocarditis, an inflammation of the heart muscle that can become life-threatening if untreated.

Rafael doesn’t blame ChatGPT, exactly. But he does blame the system that made ChatGPT feel like the only accessible option. “I have insurance,” he said. “I have a doctor. And I still couldn’t get seen. So what was I supposed to do — just sit with the fear?”

This is the part of the conversation that most coverage of the BMJ study misses. The AI triage problem isn’t primarily a technology problem. It’s an access problem wearing a technology mask. Americans turn to chatbots for the same reason they turn to urgent care clinics, Dr. Google, and that nurse cousin who’s always on Facebook — because the front door of the healthcare system is frequently locked, and even when it’s open, there’s a line out the door.

As we’ve explored before on DMNews, the systems we build for later life often fail in the most human ways — not through dramatic collapse, but through slow, quiet inaccessibility. Healthcare is no different. The average wait time to see a new primary care physician in the U.S. is now 26 days. In some specialties, it’s months. And that gap — between the moment you feel something wrong and the moment someone qualified actually looks at you — is exactly the space AI has rushed to fill.

But here’s where the complexity deepens. The BMJ study also found something the headlines largely ignored: for non-emergency symptom education — explaining what a condition is, what to expect, what questions to ask your doctor — ChatGPT performed remarkably well. It was fluent, thorough, and in many cases more patient and detailed than a time-pressured physician.

Photo by Kampus Production on Pexels

Lorraine Ku, a 67-year-old retired teacher in Portland, uses ChatGPT specifically this way. After being diagnosed with Type 2 diabetes last year, she started asking the chatbot to explain her lab results, break down medication side effects, and help her formulate questions for her endocrinologist. “My doctor gives me maybe twelve minutes,” Lorraine said. “ChatGPT gives me as long as I need.” She’s careful to note that she never uses it for acute symptoms — a distinction that, researchers say, most users don’t make.

That distinction — between understanding and deciding — is the crux of everything. There’s a concept in behavioral psychology called action bias, the tendency to prefer doing something over doing nothing when we feel uncertain. When you’re anxious about a symptom at midnight, ChatGPT doesn’t just give you information. It gives you a next step. Ibuprofen. Stretching. “Monitor and follow up.” And a next step, even a wrong one, feels profoundly better than sitting in the dark not knowing. This is part of a broader pattern psychologists have identified about how we manage uncertainty as we age — sometimes the strategies that make us feel safer are the ones quietly putting us at risk.

And the AI companies know this. OpenAI’s own usage guidelines state that ChatGPT “should not be used as a substitute for professional medical advice.” But the product itself is designed to be conversational, responsive, and accommodating — qualities that make it feel like exactly the kind of substitute it disclaims being. It’s a bit like putting a neon “OPEN” sign on a door with a small “please don’t enter” sticker at eye level.

The BMJ researchers recommended that AI companies build explicit triage safeguards — hard stops that interrupt the conversation when high-risk symptom patterns emerge and direct users to call 911 or go to the nearest emergency room. Some of this infrastructure is being tested. But as of now, it’s voluntary. There’s no regulation requiring it.

Meanwhile, the people most likely to rely on AI for medical triage are the same people least likely to have a robust relationship with a primary care physician — younger adults, uninsured or underinsured populations, rural residents, and people working multiple jobs who can’t take a Tuesday morning off for a doctor’s appointment. As research continues to show, the health behaviors that seem small or incidental — when you eat, how you sleep, who you ask when something hurts — often carry outsized consequences that only reveal themselves over years.

What haunts me about Denise’s story isn’t that ChatGPT got her triage wrong. Triage is hard. Even experienced nurses get it wrong sometimes. What haunts me is the moment she described — alone in the dark, phone glowing, a machine speaking to her in a calm, assured tone, and the wave of relief she felt when it told her she was probably fine. That relief was real. And it was the most dangerous part of the entire interaction.

Because the thing about AI-generated reassurance is that it doesn’t carry any of the weight that real reassurance does. When a doctor tells you you’re okay, they’re putting their license, their training, and their professional judgment behind that statement. When ChatGPT tells you you’re okay, nothing is behind it. No liability. No follow-up. No 3 a.m. phone call to check on you. Just a string of statistically probable words arranged to sound like care.

We’re not becoming less intelligent about our health. We’re becoming more alone in managing it. And in that aloneness, we’re reaching for whatever reaches back. The question isn’t whether AI should play a role in healthcare — it almost certainly will, and in many contexts, it should. The question is whether we’re honest about what that reach really means: not a preference for machines over doctors, but a confession that the space between feeling scared and getting help has become so vast that a chatbot can live inside it, comfortably, for weeks.

Denise is fine now. She’s on blood thinners. She has a new pulmonologist. And she still uses ChatGPT — for recipes, for work emails, for trivia. But not for symptoms. “That part’s over,” she said. “I learned what it costs to hear what you want to hear from something that doesn’t care if you live or die.”

Not because the technology failed her. Because it performed exactly as designed — and that, it turns out, was the problem all along.

Feature image by Gustavo Fring on Pexels

A new study tested ChatGPT’s ability to triage medical symptoms, and the results are raising serious questions about how millions now self-diagnose

Direct Message News

MOST RECENT ARTICLES

Retail stores already know what you feel before you reach the checkout

Corporations are routing billions toward predicting consumer behavior and almost nothing toward resolving what consumers are actually complaining about

Hyperpersonalization wins customers but it dies in the CFO’s office if you can’t prove the margin

Publishers keep chasing engagement when the real problem is they forgot what attention feels like

Most SEO keyword tools vanish the moment you leave the search bar

While everyone chased digital in 2015, direct mail quietly delivered six times the response rate