If AI can provide a better diagnosis than a doctor, what’s the prognosis for medics? | John Naughton
AI mean too many (different) things to too many people. We need better ways to talk – and think – about it. cue, Drew Breuniga talented geek and cultural anthropologist who has come up with a neat categorization of technology into three use cases: gods, trainees and cogs.
“Gods” in this sense would be “superintelligent, artificial beings that do things autonomously.” In other words, the AGI (artificial general intelligence) that OpenAI’s Sam Altman and his crowd are trying to build (at reckless expense) while warning that it could be an existential threat to humanity. AI gods are, says Breunig, “the use cases of human replacement.” They require gigantic models and huge amounts of “computation”, water and electricity (not to mention the associated CO2 emissions).
“Interns” are “supervised co-pilots who collaborate with experts, focusing on the grunt work.” In other words, things like ChatGPTClaude, Llama and similar large language models (LLM). Their defining quality is that they are designed to be used and controlled by experts. They have a high tolerance for mistakes, as the experts they help check their results, preventing embarrassing mistakes from going further. They do the grunt work: memorize documentation and navigate references, fill in details after defining the general features, help generate ideas by acting as a dynamic sounding board, and much more.
Finally, “cogs” are small machines that are optimized to perform a single task exceptionally well, usually as part of a pipeline or interface.
Interns are mostly what we have now; they present AI as a technology that augments human capabilities and is already widely used in many industries and professions. In this sense, they are the first generation of quasi-intelligent machines with which humans have had close cognitive interaction in working settings, and we are beginning to learn interesting things about how well these human-machine partnerships work.
One area where there are extravagant hopes for AI is healthcare. And with reason. In 2018, for example, a collaboration between AI researchers in DeepMind and Moorfields Eye Hospital in London have significantly accelerated the analysis of retinal scans to detect the symptoms of patients requiring emergency treatment. But in a sense, although technically difficult, it was a simple thing: the machines could “read” scans incredibly quickly and pick out those that needed specialized diagnosis and treatment.
But what about the diagnostic process itself? Replica an intriguing American study published in October in Journal of the American Medical Associationwho reported on a randomized clinical trial of whether ChatGPT could improve the diagnostic abilities of 50 medical practitioners. The ho-hum conclusion was that “the availability of the LLM for physicians as a diagnostic aid did not significantly improve clinical reasoning compared to conventional resources.” But there was a surprise hit: ChatGPT independently demonstrated higher efficiency than both groups of physicians (those with and without access to the machine).
or like New York Times sum it up“physicians who were given ChatGPT-4 along with conventional resources did only slightly better than physicians who did not have access to the bot. And to the researchers’ surprise, only ChatGPT outperformed the doctors.”
More interesting, however, were two other revelations: the experiment demonstrated doctors’ sometimes unwavering faith in the diagnosis they had made, even when ChatGPT suggested a better one; and it also suggests that at least some of the doctors didn’t really know how best to use the tool’s capabilities. Which in turn revealed what AI advocates are like for example Ethan Mollick have been saying for eons: that effective “rapid engineering”—knowing what to ask of an LLM to get the most out of it—is a subtle and poorly understood art.
Equally interesting is the effect AI collaboration has on the humans involved in the partnership. At MIT, a researcher conducted an experiment to see how well materials scientists could do their jobs if they could use AI in their research.
The answer was that the AI assistance really seemed to work, as measured by the discovery of 44% more materials and a 39% increase in patent applications. This was achieved by artificial intelligence performing more than half of the “idea generation” tasks, leaving the researchers to evaluate the model-generated candidate materials. So the AI did most of the “thinking” while they were forced to do the more mundane work of evaluating the practical feasibility of ideas. And the result: the researchers experienced a sharp decrease in job satisfaction!
interesting n’est-ce pas? These researchers are high-ranking, not low-status employees. But suddenly, collaborating with an intelligent machine made them feel like… well, cogs. And the moral? Be careful what you wish for.
What I have read
Chamber piece
What if the echo chambers work? is a striking essay that highlights a liberal dilemma in the age of Donald Trump.
Savings plan
This is sharp analysis from Reuters Charting the path for Elon Musk’s drive for efficiency.
Inventive thinking
Steven Sinofsky’s great, wise essay At the expense of being a destroyer is about innovation and change.