ChatGPT's new health feature failed to recognize three common medical emergencies in testing, and experts are calling it 'unbelievably dangerous'

Tension: ChatGPT’s health feature failed to flag three common medical emergencies in independent testing, including atypical heart attacks, early stroke symptoms, and diabetic ketoacidosis — the everyday emergencies that kill people precisely because they don’t look dramatic enough.
Noise: OpenAI positions the tool as informational, not diagnostic, but the conversational design creates a sense of consultation that overrides disclaimers. Automation complacency and the fluency heuristic mean users trust the AI more when it sounds confident, regardless of accuracy.
Direct Message: ChatGPT’s health feature fails because it’s optimized to satisfy, not to save. The most life-preserving medical response is often the most unsatisfying one — “I don’t know, go now” — and that’s the one an AI built on fluency will never naturally give.

To learn more about our editorial approach, explore The Direct Message methodology.

Last Tuesday at 11:47 p.m., Rachel Okafor, a 34-year-old graphic designer in Atlanta, typed her symptoms into ChatGPT’s health feature: sudden crushing pressure in her jaw, sweating, and nausea that had started twenty minutes earlier. She’d been reading about anxiety attacks online and figured this was another one. ChatGPT agreed. It suggested deep breathing exercises, hydration, and a follow-up with her primary care doctor “when convenient.”

Rachel’s roommate, who happened to be a nursing student, walked into the kitchen, took one look at her, and called 911. At Grady Memorial, the ER team confirmed Rachel was having a heart attack. The kind that kills women under 40 at disproportionate rates precisely because the symptoms don’t look like what we’ve been trained to expect.

Rachel is fine now. But her story keeps replaying in my head, because it sits at the center of something we’re all sleepwalking into.

OpenAI launched its health feature with considerable fanfare, positioning ChatGPT as a kind of medical co-pilot for the curious and the anxious alike. Millions of people had already been using the chatbot for symptom-checking anyway, so the company’s logic made a certain sense: if people are going to do this regardless, give them a better version of it. The problem is that “better” turned out to be a dangerously elastic word.

Independent researchers recently put ChatGPT’s health capabilities through structured testing, and as we covered in detail, the results were alarming. The AI failed to flag three common medical emergencies: atypical heart attack presentations (particularly in women), early-stage stroke symptoms that didn’t follow the textbook “FAST” acronym, and diabetic ketoacidosis in patients who didn’t know they were diabetic. In each case, the chatbot offered reassurance where urgency was required.

Dr. Vivek Murali, a 51-year-old emergency physician in Chicago who participated in the review, called the findings “unbelievably dangerous.” His concern wasn’t that the AI got obscure diagnoses wrong. His concern was that it missed the bread and butter of emergency medicine: recognizing when something common is trying to kill you.

Photo by Matheus Bertelli on Pexels

There’s a psychological phenomenon that researchers call automation complacency, the well-documented tendency for humans to lower their guard when a machine is handling the thinking. A 2017 study published in Human Factors found that people consistently trust automated systems even after being shown evidence of their failures. The trust isn’t rational. It’s almost gravitational. We want the machine to be right because the alternative, sitting with uncertainty, is unbearable.

This is what makes the ChatGPT health situation different from, say, Googling your symptoms at 2 a.m. Google gives you a chaotic wall of links, some terrifying, some useless. The chaos itself is a kind of built-in safety mechanism. You know you’re wading through noise. ChatGPT, by contrast, gives you a single, confident, conversational answer. It feels like talking to someone who knows. And that feeling is the danger.

Consider Marcus Tan, a 28-year-old software developer in Austin. Marcus described to ChatGPT a scenario involving sudden blurred vision in one eye, a mild headache, and tingling in his left hand. These are textbook warning signs of a transient ischemic attack, often called a “mini-stroke,” which frequently precedes a full stroke within hours or days. The chatbot’s response focused on screen fatigue and tension headaches. It recommended reducing blue light exposure. As researchers have explored, this pattern of confident misdirection keeps surfacing in triage testing.

Marcus didn’t actually have a TIA. He was part of the testing group, feeding the AI scripted symptom profiles drawn from real emergency cases. But someone out there, tonight, is typing those exact symptoms into the same tool and getting that same answer. The math on this is simple and grim: ChatGPT has over 200 million weekly active users. Even a small percentage using it for health questions means millions of interactions. Even a small failure rate means thousands of people getting the wrong answer at the worst possible moment.

The cultural context matters here, too. We’re living through a strange era where trust in institutions, including the medical establishment, has eroded significantly while trust in technology companies has paradoxically grown. A 2023 study in the Journal of Medical Internet Research found that nearly 70% of respondents who used AI chatbots for health information rated the experience as “helpful” or “very helpful,” even when the information provided was later assessed as incomplete or inaccurate by clinicians. People rated helpfulness based on how the interaction felt, not on clinical accuracy.

This is what psychologists call the fluency heuristic: the smoother and more coherent information sounds, the more true it seems. ChatGPT is, above all else, a fluency machine. It sounds like it knows what it’s talking about because sounding like it knows what it’s talking about is the entire basis of how it was built.

Photo by RDNE Stock project on Pexels

Dr. Priya Chandrasekaran, a 44-year-old internist in Portland, told me she’s started asking every new patient the same question: “Have you checked your symptoms with an AI before coming here?” About a third say yes. Some come in with printouts of ChatGPT conversations, annotated and highlighted, like evidence submitted to a court. “The patients aren’t foolish,” she said. “They’re doing exactly what we’ve trained them to do: take charge of their health, do their research, be informed consumers. We just never expected the research tool to sound so authoritative while being so unreliable.”

This echoes something we’ve seen in other domains. As one reader’s experience with supplement stacking showed, the gap between “sounds scientific” and “is actually good for you” can be enormous. The packaging of authority matters as much as the authority itself.

OpenAI has acknowledged limitations in their health feature, noting that it’s meant to “inform, not diagnose.” But this disclaimer operates in the same category as the surgeon general’s warning on a cigarette pack. Technically present. Functionally invisible. The user experience of ChatGPT’s health tool is designed to feel like a consultation, not a search engine. And when something feels like a consultation, people treat it like one.

There’s also the equity dimension. Darren Whitfield, a 39-year-old warehouse supervisor in Memphis, doesn’t have a primary care physician. He hasn’t had one in six years. His employer offers a high-deductible health plan with a $4,000 threshold before coverage kicks in. For Darren, ChatGPT isn’t a convenience or a novelty. It’s the closest thing to medical advice he can afford. And for the millions of people in similar positions, the stakes of AI getting it wrong aren’t abstract. They’re measured in hours, in ambulance rides that didn’t happen, in the distance between a kitchen table and an emergency room.

The uncomfortable truth that keeps surfacing in all of this testing, all of these expert interviews, all of these near-miss stories, is that ChatGPT’s health feature doesn’t fail because it’s stupid. It fails because it’s optimized for the wrong thing. It’s optimized to satisfy. To resolve the anxiety of the question. To give you something that feels like an answer. And in most of life, that’s fine. In medicine, satisfaction and accuracy are often at war with each other. The correct answer to “Am I having a heart attack?” is frequently “I don’t know, and you need to go somewhere right now where someone can find out.” That answer is deeply unsatisfying. It resolves nothing. And it saves lives.

Rachel Okafor keeps her hospital discharge papers in a folder on her desk. She told me she’s not angry at ChatGPT, exactly. She’s angry at how easy it was to believe it. How natural it felt to type instead of dial. How the calm, measured response on the screen quieted the alarm her body was trying to sound.

The technology will improve. The models will get better at triage. Guardrails will tighten. But the fundamental tension won’t go away, because it lives in us, not in the software. We want certainty more than we want truth. We want comfort more than we want accuracy. And the machines we’re building are getting extraordinarily good at giving us exactly what we want.

Feature image by Matheus Bertelli on Pexels

ChatGPT’s new health feature failed to recognize three common medical emergencies in testing, and experts are calling it ‘unbelievably dangerous’

Direct Message News

MOST RECENT ARTICLES

People raised in the 60s and 70s grew up with childhoods that had fewer passwords, fewer cameras, fewer schedules, and more sky

Thought of the day from Daniel Kahneman: “People who are cognitively busy are more likely to make selfish choices, use sexist language, and make superficial judgments in social situations”

My friend told me retiring didn’t feel like freedom at first — it felt like being handed back every hour she’d ever wished for and not knowing who she was inside them

7 things that quietly get easier after 65 that nobody tells you about, because we only ever talk about what gets harder

The flywheel effect — a well-known concept in platform economics — helps explain how YouTube became dominant and why Meta may be falling behind

The resentment some parents feel about their adult kids’ phones during visits isn’t about technology — it’s the old human ache of wanting to feel their presence still matters