Close Menu
TemporaerTemporaer
  • Home
  • About
  • Privacy Policy
  • Terms of Service
  • Contact
  • Science
  • Technology
  • News
Facebook X (Twitter) Instagram
Facebook X (Twitter)
TemporaerTemporaer
Subscribe Login
  • Home
  • About
  • Privacy Policy
  • Terms of Service
  • Contact
  • Science
  • Technology
  • News
TemporaerTemporaer
  • Home
  • About
  • Privacy Policy
  • Terms of Service
  • Contact
  • Science
  • Technology
  • News
Home » Multimodal Supremacy: How GPT-4V is Diagnosing Diseases Faster Than Doctors
Health

Multimodal Supremacy: How GPT-4V is Diagnosing Diseases Faster Than Doctors

Melissa HoganBy Melissa HoganApril 17, 2026No Comments5 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Share
Facebook Twitter LinkedIn Pinterest Email

A radiologist sits in near-darkness, eyes scanning image after image on two large monitors, making decisions that will change lives in a room somewhere in a hospital. You’ve probably passed one without giving it much thought. It’s a very human, cautious, and slow process. And somewhere in an electrically charged data center, a machine is performing a remarkably similar task, albeit more quickly and sometimes more effectively.

OpenAI’s multimodal AI model, GPT-4V, has been subtly gaining attention in the medical research community. The model achieved 81.6% accuracy when tested against the New England Journal of Medicine’s Image Challenge, a diagnostic test designed to test skilled doctors.

CategoryDetail
Full nameGPT-4 with Vision (GPT-4V)
Developed byOpenAI, San Francisco, California
Release year2023 (multimodal capability added to GPT-4)
Model typeMultimodal Large Language Model (LLM) — processes both text and images
Medical accuracy (NEJM Image Challenge)81.6% vs. 77.8% for human physicians
USMLE performance (Step 3)88.9% accuracy on image-based questions
Key strengthSimultaneous analysis of medical images and clinical text
Key limitationFlawed image rationales in 35.5% of correct answers; image misunderstanding in 76.3% of wrong answers
Error reduction with expert guidanceAverage 40% reduction in errors when paired with human physicians
Clinical readinessNot yet approved for standalone clinical use; requires further validation
Benchmark usedNEJM Image Challenge, USMLE Steps 1–3, Diagnostic Radiology Qualifying Core Exam
Comparable AI systemsGPT-3.5 Turbo, GPT-4 (text-only), Med-PaLM 2 (Google)

The doctors against whom it was evaluated? 77.8%. That gap is both big enough to draw your attention and small enough to cause you to pause. It’s possible that the majority of readers still envision AI medicine in ten years. It isn’t.

It’s not just intelligence that sets GPT-4V apart from previous systems. It’s the ability to look. Large language models from the past, such as GPT-3.5 Turbo and even the text-only version of GPT-4, were capable of reading a patient’s medical history, parsing clinical notes, and making differential diagnosis suggestions.

Multimodal Supremacy
Multimodal Supremacy

However, when presented with an X-ray or a picture of a skin lesion, these models became blind. GPT-4V altered that. It started handling the type of multimodal thinking that is nearly always needed in medicine by processing both written and visual data at the same time. A patient arrives with scans in addition to symptoms, not just symptoms. Both are now visible to the machine.

It is difficult to ignore the figures from evaluations connected to hospitals and universities. GPT-4V scored 88.9% on image-based questions on USMLE Step 3, which is arguably the most clinically challenging licensing exam in American medicine. That grade is not good enough to pass. It’s quite impressive. For comparison, the text-only GPT-4 scored 66.7% on the same questions, while GPT-3.5 Turbo only achieved 50%.

There is a significant generational gap. Even though there wasn’t a single announcement that signaled the shift, it’s difficult to ignore the fact that we crossed some sort of threshold between 2022 and the present.

However, this is where things start to get really complicated, and this is where the zeal should stop. Researchers from several institutions discovered something unsettling when they examined not only whether GPT-4V arrived at the correct answer, but also how it got there—the reasoning, the image interpretation, the logical chain.

In over one-third of the cases where GPT-4V made the right diagnosis, its justification was incorrect. For reasons that weren’t totally convincing, the model was correct. Additionally, over three-quarters of the errors were caused by a misinterpretation of the image. In a way, it was sometimes accurately reading the room while misinterpreting the walls.

This is not a small technical detail. In medicine, the logic is just as important as the outcome. You would want to double-check a doctor who provides a patient with the correct diagnosis but explains it using flawed reasoning. An algorithm should be held to the same standards.

I was struck by one researcher’s framing: right clinical decisions are not the same as right final decisions based on faulty justifications. Where patients reside is the difference between those two factors.

A truly peculiar dynamic is developing here, one that is similar to what occurred with the widespread use of GPS navigation. People lost all comprehension of how the roads connected, but they began to arrive at their destination.

With diagnosis, GPT-4V may be doing something similar, generating accurate results through partially opaque processes that practitioners are unable to fully examine. In its early years, Tesla encountered similar concerns regarding automation and accountability. As was to be expected, the argument grew louder before it was settled.

The error rate decreased by an average of 40% when researchers added expert assistance to GPT-4V’s workflow, essentially having a doctor watch over the model’s shoulder and provide guidance. That number is significant because it indicates that the model’s ceiling is significantly higher when combined with human expertise as opposed to operating independently.

AI taking the place of the radiologist in the dark room is not the most likely short-term scenario. Sitting next to them, AI flags what it sees, sometimes catching what weary eyes miss, and getting caught in return when its visual reasoning wanders.

It remains to be seen if that collaboration survives in real-world clinical settings with all of their commotion, urgency, and liability concerns. Researchers exercise caution. Administrators at hospitals are even more wary.

Additionally, the regulatory agencies that must approve these systems for clinical use operate at a speed that makes academic publishing appear quick. Whether the excitement in research papers will result in deployment schedules expressed in years or decades is still up in the air.

Something has changed, that much is certain. In medical AI circles, the question of whether a machine can read a scan as well as a doctor is no longer relevant. There is a tentative response to that question. The more difficult questions are only now starting to be taken seriously, such as those concerning accountability, error patterns, and what happens when the AI is blatantly incorrect. It’s possible that the hospital room with the two monitors is identical to the exterior. However, what’s taking place within it already has.

Multimodal Supremacy
Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticleThe AI Experiment That Shocked Its Creators
Next Article Scientists Say They May Have Discovered New Form of Intelligence
Melissa Hogan
  • Website

Melissa Hogan is the Senior Editor at Temporaer, and quite possibly the person on the internet who has thought the most about what happens to your data when a hard disk drive fails. She is a self-described storage hardware obsessive — the kind of person who reads NVMe specification documents for fun, tracks NAND flash fab yield rates with genuine emotional investment, and has strong, considered opinions about why QLC cells are misunderstood by mainstream tech media. She came to technology writing the way many of the best specialists do: not through a newsroom, but through an obsession that simply refused to stay quiet.Melissa, a stay-at-home mother, is an example of what the technology industry frequently undervalues: the serious, self-made expert who exists entirely outside of the institutional pipeline. She developed her technological expertise solely through self-directed learning, practical hardware experimentation, and an extraordinary appetite for technical documentation. She doesn't have a degree in journalism or experience in corporate technology, but what she brings to her editorial work at Temporaer is something more uncommon: a sincere, unfulfilled passion for how computers store, retrieve, and safeguard data, along with the patience to fully comprehend it and the ability to articulate it.

Leave A Reply Cancel Reply

You must be logged in to post a comment.

Science

How to Destroy a Hard Drive So the NSA Can Never Recover Your Data

By Melissa HoganApril 21, 20260

There’s a certain false sense of security that results from selecting “delete.” The file is…

The $100 Million AI Safety Pitch That Major Tech Giants Are Being Asked to Fund

April 21, 2026

Why the World’s Biggest Tech Companies Are Suddenly Investing in Nuclear Fusion

April 21, 2026

Researchers Say Machines May Soon Think Independently — And the Line Between Illusion and Reality Is Blurring Fast

April 21, 2026

This Breakthrough Changes Everything — And Most People Haven’t Heard About It Yet

April 21, 2026

Scientists Say They Are Entering Unknown Territory

April 21, 2026

How China’s Lithium-Free Fertilizer Production Is Insulating It From a Crisis Hitting Everyone Else

April 21, 2026
About

Temporaer (temporaer.info) is an independent technology publication covering computer hardware, software, data storage devices, emerging storage technologies, and artificial intelligence. We report on the latest developments, news, updates, explain complex technical subjects in plain language, and publish expert perspectives.

Disclaimer

Hardware reviews, software analysis, storage technology guides, AI coverage, technology industry financial reporting, market commentary, expert opinion, editorial analysis, and all other content published on Temporaer do not constitute financial advice, investment advice, securities recommendations, legal advice, or professional counsel of any kind. This website’s content is exclusively offered for news reporting, education, and informational purposes.

Facebook X (Twitter)
  • Home
  • About
  • Privacy Policy
  • Terms of Service
  • Contact
  • Science
  • Technology
  • News
© 2026 ThemeSphere. Designed by ThemeSphere.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?