header print

Microsoft Invents an AI Doctor Better Than Human Doctors

A new study presents a committee of digital doctors achieving approximately 80% accuracy in complex medical diagnoses – far beyond the average of general practitioners. This represents a breakthrough that could change the future of medicine, and perhaps many other fields as well.

For 15 long years, John lived with vomiting as a way of life. He didn't do it for pleasure or to try to lose weight. He simply vomited. Walking down the street, and suddenly – vomiting. Eating dinner and suddenly – you get the picture.

John didn't easily accept his condition.

"I had every stomach test and every allergy test available," he shared in a Reddit post, "and recently I was diagnosed with anxiety and the medications really helped, but [the vomiting] never stopped."

Eventually, John consulted with artificial intelligence, and following its recommendation, underwent an examination with an "ear-nose-throat and head-neck surgery" specialist and a brain scan. The examination revealed that he suffered from severe and persistent inner ear inflammation, which was easily treatable.

John is a pseudonym for a Reddit user, but the story, as far as can be determined, is real. It's accompanied by a wealth of additional stories shared by other users on the platform. One of the most notable involves a person who spent years going between medical specialists trying to understand the cause of his illness to no avail, until ChatGPT suggested the possibility of a specific mutation. The patient raised the issue with doctors, and a quick genetic test showed that the mutation was indeed the source of the problem.

It should be said now that one must be careful – and maintain critical thinking – when using artificial intelligence for medical advice. It can certainly reach accurate diagnoses, but it can also make serious mistakes and disguise them in convincing, professional language. However, those who know how to use it correctly can achieve nothing short of miraculous results.

This is what Microsoft's new study, released last week, demonstrated. It reveals that artificial intelligence is capable of diagnosing complex medical conditions with a high level of success: approximately 85 percent. These aren't cases of runny noses or nail fungus, but conditions that challenge even specialist doctors, presented to them as 'puzzles to solve' at the most prestigious medical conferences. The human doctors who succeed in solving these same puzzles get published in medical journals and earn recognition from all their colleagues.

Now it turns out that artificial intelligence can solve these same puzzles on its own. Not only that, but it's also available to any of us, at a cost of twenty dollars per month.

Now that I've excited you, let's start from the beginning and explain what's significant about this research (and what its weaknesses are), and why we all need to understand what Microsoft's researchers did – and how these same principles should be embedded in every profession and field of work today.

The Challenges Facing Doctors

A few years ago, I had to arrive at the hospital in the middle of the night for a reason that remains between me and the healthcare system. The zombie-doctor who examined me looked as if he hadn't slept for thirty hours, and he probably hadn't. I realized that if I wanted optimal treatment, I needed to wake him up.

"By the way, I'm currently collaborating with a researcher at the faculty," I mentioned casually, as he dragged his feet heavily toward the door, "we're publishing soon in the New England Journal of Medicine."

He stopped in his tracks like lightning had struck. The thin hairs on his ears vibrated as he slowly turned his head toward me. The nurse stared at me in awe. Two residents, with bat-like hearing achieved after their seventh cup of coffee, rushed into the room and begged for the honor to speak with me and analyze my case. When I finished my business in the department, the medical staff rolled out the red carpet for me and asked me to return anytime.

At least, that's how it felt. The actual treatment was less servile, but the reverence was still clear.

Why? Because all doctors read the New England Journal of Medicine. Anyone who publishes a medical article in that magazine earns fame in their department, and even throughout the entire hospital.

In regular newspapers for regular people, you can usually find the weekly crossword area, black and white puzzles, or "the country quiz." In the New England Journal, as you might expect, there are more advanced mental exercises designed for specialist doctors. These are based on challenges presented to doctors at conferences, where medical professionals are asked to diagnose a patient who comes to them with unusual symptoms.

microsoft's new doctor AI

The doctors in the challenge are invited to ask the 'patient' follow-up questions: "Have you recently visited tropical areas?", "Do you vomit after drinking tropical beverages?", "Has a dog tried to attack you?"

Participants can also request to put the patient through medical tests of all kinds: from blood tests, through MRI scans to complete genetic sequencing. But there's a catch: they have to pay for these tests. Not themselves, of course. They're doctors. But the cost of each such imaginary test is carefully calculated. The doctors who win the challenge are those who actually succeed in diagnosing the patient – but also do so with the cheapest and fastest tests possible. In other words, the best doctors are those who manage to decode the disease with minimum hassle and costs to the patient, and of course to the health fund as well.

Victory in such a challenge is a badge of honor for doctors, since the cases are designed in advance to be difficult to solve, especially when doctors are limited in the cost of tests they can order.

So, what is the success rate of artificial intelligence in such a test? Can it beat doctors at their own game?

This is what Microsoft researchers decided to test, with a special committee of artificial intelligences they created for this purpose.

Microsoft's Medical Committee

microsoft's new doctor AI

The artificial intelligence that Microsoft created is actually a "committee" of artificial intelligences. It consists of five artificial intelligences, each of which 'plays' a different role:

Hypothesis Doctor: Examines different possibilities and ranks which are most likely.

Test-Selector Doctor: Chooses up to three tests that will help distinguish between leading hypotheses.

Challenger Doctor: Acts as a "devil's advocate" and identifies thinking biases so far, sheds light on conflicting evidence, and suggests tests that could refute the leading hypotheses.

Economist Doctor: Encourages selection of cheaper tests and vetoes expensive tests that aren't supposed to bring much benefit.

Reviewer Doctor: Performs quality checks in the background to ensure the committee uses correct names for tests it requests and doesn't violate challenge rules.

All five of these doctors are actually different instances of most powerful GPT's. Each comes with different needs and desires defined in their system prompt, so each emphasizes different points that concern them.

The final result is, literally, a committee. An autonomous expert committee. The artificial doctors 'talk' with each other, conduct learned and polite discussion, and don't interrupt each other's words. They balance each other to reach the final decision in each round: to ask the patient questions, demand additional tests, or provide the final diagnosis. When all committee members are convinced they're close enough to a diagnosis, they pass the answer to a judge who decides if they were right or not.

Miraculously, this autonomous committee succeeds in reaching the correct diagnosis in 80 percent of cases. Oh, and it does so while relying on lower-cost tests ($2,396) than those ordered by most human doctors.

I want to emphasize this figure: artificial intelligence, whose operation costs at most a few dollars, succeeds with high probability in reaching the same diagnosis as the most advanced human doctors. And the diagnosis is also cheaper in terms of the tests needed to make it. In other words, not only is the diagnosis successful, but the diagnostic process is also more efficient.

How do we know this? Because Microsoft researchers also tested the performance level of human doctors on exactly the same questions. The performance, to put it gently, was not encouraging. The most successful human doctor reached a success rate of only forty percent in solving the questions. And the average doctor? He solved correctly only twenty percent of the questions, with an average test cost of $2,963.

microsoft's new doctor AI

Microsoft's artificial committee provides a glimpse of what's to come: a world where every person will benefit from advisory and support services from an entire committee of artificial doctors, which together is more successful than any human doctor.

And not just in medicine.

The Committee Model

"A committee," wrote Robert Heinlein many years ago, "is a life form with six or more legs and no brain."

Anyone who has previously participated in a committee can understand Heinlein's frustration. Committee participants are often driven by foreign motives, don't really listen to each other, or mainly want to show off their wisdom and talk. In many cases, junior committee members fear going against the position of more respected participants, like Nobel Prize winners or their managers. The most natural result of many committee discussions is perpetuating consensus: that same famous conception that no one dares challenge.

But does this have to be the case?

Microsoft's medical committee model demonstrates a new form of deliberation that some would say only artificial intelligence is truly capable of. Committee members really listen to each other, relate to each other's words, and challenge each other in a way that makes the committee's products more accurate than those of any 'individual' artificial intelligence.

What results could we get if management discussions were also conducted this way? Or security discussions? Or even government cabinet discussions? Or – hell – if we turned every professional and official decision into a decision resulting from committee discussions where artificial intelligences talk with each other?

How should such committees be built? Who should be committee members, and what is the optimal 'speaking' order for each of them? Should some have more power than others? Should one or more of them have the ability to veto committee decisions?

There aren't good answers to these questions, because we've never had 'thinking' entities, or at least ones with processes that mimic the results of human thinking, as we have today. How should such entities talk with each other, and what should be the rules of discourse between them? We have no idea. Simply no idea. Sociologists and management researchers in academia build entire careers on studying decision-making processes in committees and organizations. And here's another new profession we're about to see in the coming years: managing and regulating interactions between artificial intelligences in committees and in general.

And perhaps one more profession, old-fashioned: critical thinking. Because even in Microsoft's research, problems can be found, and quite a few of them. While they don't change the final message – that artificial intelligence will positively impact our lives, including through discourse between artificial intelligences – it's important to address them as well.

The Small Details

When I was in my nanotechnology doctoral studies, I had to review many studies. Each looked impressive from the outside, and only careful reading of the fine print revealed the hidden problems within. In distress, I turned to one wise doctor, and she told me –

"Studies are like sausages: after you see how they're made, you're no longer willing to swallow them with the same ease."

She was right. Every study has problems and small details that make it difficult to accept the result as clear truth. This is also the case with Microsoft's new study.

I said they compared artificial intelligence performance to doctors? That's true, but which doctors? Well, all the doctors in the study were "general" doctors. That is, without specific expertise in the questions they were asked. So from the outset, Microsoft is comparing their artificial intelligence to doctors at the 'basic' level.

But it gets worse.

The doctors who participated in the study were asked not to use external sources to answer the questions. No Google, no ChatGPT, or any other online source. Simply not. That is, they had to answer the questions relying solely on their human brains. I'm not sure there's still a doctor who knows how to think without external aids. But they had to answer the questions this way.

But it's even worse.

The human doctors received 56 cases they were asked to diagnose. No time limit was imposed on them, but it can be understood that after several such cases, they were certainly exhausted. It would have been interesting to compare their success rate in decoding the first cases versus success in decoding the last cases, when they were already at the end of their strength. Artificial intelligence, of course, has no such problems. It could have continued processing thousands of cases until the collapse of human civilization and its replacement by intelligent octopi.

But it's even worse.

I said the autonomous medical committee reached accurate and efficient results (with cheap tests). That's true, but I didn't mention that the researchers also threw "reasoning models" into the mix: for example, GPT-O3, Claude 4-Opus, and Gemini 2.5-Pro. These models weren't operated as a medical committee, but were simply asked what disease the patients suffered from. They were also given the opportunity to ask questions and order tests. These models worked simply as 'single' models, not as part of an autonomous medical committee.

And they succeeded at a high level.

The GPT engine – the one any of us can use for twenty dollars a month – managed to reach nearly eighty percent accuracy in its diagnoses (although it required particularly expensive tests before reaching a conclusion). Claude 4-Opus and Gemini 2.5-Pro reached about seventy percent accuracy, and they too ordered expensive tests.

microsoft's new doctor AI

The committee model still succeeded more than the single engines, but not much more. And perhaps this shouldn't surprise us. We know that when reasoning engines try to solve a complex problem, they attack it from different directions and angles and viewpoints. It's quite possible that the committee model is automatically embedded in reasoning engines, so they 'separate themselves' into different entities when trying to solve a problem, let them talk with each other, and summarize the results. Externally they appear as if there's only one line of thought there. But internally? Only the artificial intelligence companies know exactly how reasoning models solve problems.

And all this pedantic nitpicking about the research doesn't change the final result, as mentioned: artificial intelligences that are in all our hands today are capable of providing answers with approximately eighty percent accuracy to complex medical questions. Do they do this better than human doctors? Maybe. With all the limitations the researchers imposed on human doctors, it's quite clear they tilted the playing field. But who cares. The main thing is that artificial intelligences succeed in solving medical problems at such a high level.

This success, by itself, will change the world and the medical profession. Each of us will have a committee of specialist doctors looking at us and examining us whenever we want. Everyone will be able to receive recommendations from such a committee, at zero cost. And this will happen in every field: in medicine, in law, in accounting, and in matchmaking. Everything.

This is the true meaning of Microsoft's research.

Next Post
Sign Up for Free Daily Posts!
Did you mean:
Continue With: Facebook Google
By continuing, you agree to our T&C and Privacy Policy
Sign Up for Free Daily Posts!
Did you mean:
Continue With: Facebook Google
By continuing, you agree to our T&C and Privacy Policy