Join Nostr
2026-04-15 21:19:57 UTC

Miguel Afonso Caetano on Nostr: RT @HedgieMarkets 🦔A study from Mass General Brigham tested 21 AI models on ...

RT @HedgieMarkets
🦔A study from Mass General Brigham tested 21 AI models on medical diagnosis tasks. For differential diagnosis with incomplete patient information, all models had error rates above 80%. With more complete data, error rates fell below 40%, with the best performers reaching 90% accuracy. The models consistently narrowed to a single diagnosis rather than suggesting a range of possibilities. One in three US adults have used AI chatbots for medical advice in the past year. Google and Amazon are both developing dedicated medical chatbots.

My Take
Above 80% error rates for differential diagnosis with incomplete information is the precise scenario most people are in when they turn to AI with a health concern. Nobody walks in with a complete patient file. They describe symptoms, worry about something specific, and ask what it might be. That's exactly when these models fail most.

The way these models fail is as important as how often. They narrow to a single confident answer rather than offering a range of possibilities. When a doctor says it could be one of several things, that's honesty about uncertainty. When an AI says it's probably this one thing, it reads as authority. I've covered the cognitive surrender research showing people accept AI outputs without scrutiny 73% of the time even when the AI is wrong.
Confident wrong answers in a medical context are a different order of problem than confident wrong answers about anything else. Google and Amazon are both racing to release dedicated medical chatbots into this environment while disclaiming clinical responsibility, and I think that deserves serious regulatory attention before it causes widespread harm.

Hedgie🤗

Link to study: https://massgeneralbrigham.org/en/about/newsroom/press-releases/ai-chatbot-lacks-clinical-reasoning