- Tension: ChatGPT’s health feature responded to textbook medical emergencies with calm reassurance and lifestyle suggestions, and real patients nearly died because of it.
- Noise: The tool’s fluent, authoritative tone triggers automation bias — people defer to confident-sounding AI even when their own bodies are telling them something is catastrophically wrong.
- Direct Message: An AI that sounds like a doctor but can’t recognize when you’re dying isn’t a healthcare tool — it’s a liability disguised as progress, and the people most at risk are the ones most alone when they use it.
To learn more about our editorial approach, explore The Direct Message methodology.
Last Tuesday at 11:47 p.m., Diane Hollis, a 58-year-old retired school librarian in Flagstaff, Arizona, typed her symptoms into ChatGPT’s new health feature. Crushing chest pressure. Tingling down her left arm. Shortness of breath that had come on suddenly while watching television. The chatbot responded with a calm, measured paragraph about possible causes of chest discomfort, including anxiety, acid reflux, and muscle strain. It suggested she try deep breathing exercises and consider scheduling an appointment with her doctor if the symptoms persisted.
Diane’s daughter, who happened to call ten minutes later, heard her mother’s voice and drove her to the emergency room. She was having a heart attack. The ER physician who treated her later said the delay could have been fatal.
Diane’s story is not an outlier. It is quickly becoming a pattern.
When OpenAI rolled out its health reasoning capabilities for ChatGPT in late 2024 and expanded them in 2025, the promise was seductive: AI that could help people understand their symptoms, interpret lab results, and navigate the overwhelming complexity of modern healthcare. For millions of people without easy access to a doctor, or those who simply wanted a second opinion before making an appointment, it felt like progress. And in some cases, it has been. The tool can synthesize medical literature with striking fluency. It can explain bloodwork in plain language. As one recent account demonstrated, people are already using AI outputs to have more informed conversations with their physicians.
But fluency is not the same thing as safety. And that distinction is proving to be a dangerous one.

Independent testing by physicians and AI safety researchers has exposed alarming failures in ChatGPT’s ability to recognize medical emergencies. In scenarios designed to simulate classic presentations of heart attacks, strokes, pulmonary embolisms, and ectopic pregnancies, the system repeatedly failed to recommend immediate emergency care. Dr. Ayo Ademola, an emergency medicine physician in Atlanta who participated in one round of structured testing, described the results as “genuinely frightening.” In multiple cases, the chatbot offered lifestyle recommendations or suggested booking a routine appointment for symptoms that any first-year medical student would recognize as a 911 call.
A 2024 study published on arXiv examining large language models’ triage accuracy found that ChatGPT correctly identified urgent and emergent conditions only about 60% of the time, a rate that researchers described as insufficient for any real-world health application. The false reassurance problem, where the model confidently suggests a benign explanation for a serious symptom, was the most common and most dangerous failure mode.
This matters because of how people actually behave. When someone is scared and uncertain, they are looking for permission to calm down. A soothing paragraph from an AI that sounds authoritative provides exactly that, even when the correct response would be to instill urgency. Marcus Chen, a 34-year-old software developer in Portland, described his own experience after waking with sudden, severe headache and visual disturbances. “It told me it was probably a tension headache and to hydrate,” he said. “I almost went back to sleep.” His wife insisted they go to the ER. He had a subarachnoid hemorrhage. Researchers have been raising exactly these questions about how people use AI for health advice, and the data keeps confirming the worst fears.
The psychological mechanism at play has a name: automation bias. It’s the well-documented tendency for humans to defer to the output of a computer system, especially one that presents information with confidence and apparent authority. A 2018 study in the International Journal of Human-Computer Studies showed that people follow automated recommendations even when those recommendations contradict their own observations. In medicine, this is catastrophic. A patient who feels something is deeply wrong but reads a reassuring AI response is fighting not just their symptoms but the persuasive weight of a system that sounds like it knows what it’s talking about.
OpenAI has included disclaimers, of course. The fine print tells users that ChatGPT is not a substitute for professional medical advice and that they should seek emergency care when appropriate. But disclaimers operate in a rational, careful headspace that has almost nothing to do with how a person interacts with technology at midnight when they’re in pain and afraid. Nadia Okafor, a 41-year-old nurse practitioner in Chicago who has been vocal about the risks, put it bluntly: “Nobody reads the disclaimer when their chest hurts. They read the answer.”

What makes this moment particularly fraught is the cultural context. We are living through a period of unprecedented enthusiasm for AI capabilities. As previous reporting has detailed, each new round of testing reveals the same core failures, yet adoption continues to accelerate. The gap between what the tool can do (synthesize information impressively) and what people expect it to do (keep them safe) widens with every update. And the company’s marketing walks a careful line, celebrating the health feature’s potential while technically never claiming it replaces a doctor, knowing full well that’s exactly how millions of people will use it.
There’s a parallel to something happening in other areas of health optimization. Research into GLP-1 drugs like Ozempic has revealed that powerful interventions can reshape behavior in ways we don’t fully understand yet. The same principle applies here, but inverted. A tool that reshapes health-seeking behavior by providing false reassurance at critical moments is an intervention too, just one nobody prescribed and nobody is monitoring.
The deeper issue is something researchers call “diagnostic anchoring,” where the first explanation a person encounters becomes the frame through which they interpret everything that follows. If ChatGPT tells you your crushing chest pain might be acid reflux, you start noticing the things that fit that explanation. You remember the spicy food you ate. You recall that you’ve been stressed. The alternative explanation, the one that requires an ambulance, recedes. And every minute it recedes, the window for intervention narrows.
Dr. Ademola and a growing coalition of emergency physicians have called for mandatory emergency triage protocols built into any consumer-facing health AI. The proposal is straightforward: when a user describes a constellation of symptoms that matches known emergency presentations, the system should immediately and unambiguously direct them to call 911 or their local emergency number. Before any other information. Before reassurance. Before differential diagnoses. The technology to do this exists. Symptom-checker algorithms with strong emergency flagging have been around for over a decade. The question is whether companies will prioritize safety over the conversational smoothness that makes their products feel magical.
Diane Hollis recovered. So did Marcus Chen. They were lucky in a specific way: someone else in the room overrode the AI’s assessment. Diane’s daughter heard something wrong in her voice. Marcus’s wife trusted her gut more than the screen. The uncomfortable question is what happens to the people who are alone. The 67-year-old widower checking symptoms at 2 a.m. The college student in a dorm room who doesn’t want to “overreact.” The person in a rural area who is already skeptical about whether their symptoms are serious enough to justify a two-hour drive to the nearest hospital.
For them, the chatbot isn’t a second opinion. It is the only opinion. And right now, that opinion is getting the most important calls wrong.
We keep reaching for technology to close the gaps in our healthcare system, and the instinct makes sense. Access is uneven. Costs are staggering. Wait times are absurd. An AI that could genuinely help people navigate all of that would be transformative. But the prerequisite for that transformation is a tool that knows the limits of its own knowledge, one that can say, with clarity and force: Stop reading this. Call for help now.
Until that’s the default, we’re building a system that sounds like a doctor, reads like a doctor, and fails like nothing a doctor ever would.