The Problem: A Genome Is a Haystack

Finding the genetic cause of a rare disease is genuinely hard. There are roughly 20,000 protein-coding genes in the human genome. Sequencing a patient’s DNA is now routine. Making sense of it is not.
New research gets published constantly. A gene-disease link that didn’t exist in the literature when a child was first seen at a hospital might be well-documented two years later. But no human analyst has the bandwidth to re-examine hundreds of cold cases every time a new paper drops.
“A researcher can only spend so much time on a single case,” said Suyash Shringarpure, a technical researcher at OpenAI focused on health applications. “Maybe a case remained unsolved when it came to them first, but a year later a paper was published that clarifies the link between the gene and the disease.”
That’s exactly the kind of scattered, high-volume pattern recognition that large language models are built for.
The Workflow: How They Actually Used O3

This wasn’t a speculative pilot. The team at the Manton Center for Orphan Disease Research ran a structured, reviewable process.
For each of the 376 cases, researchers fed O3 three inputs:
- Clinician notes on the patient’s history
- A description of the patient’s symptoms
- A filtered list of candidate genes potentially linked to those symptoms
The model then searched for diagnostic connections across its training data — essentially doing in minutes what a human analyst might spend days on. Every output was reviewed by the human research team before any diagnosis was confirmed.
The results, published in NEJM AI, broke down across four disease categories:
- 10 patients with rare neurodevelopmental diseases
- 4 patients with neuromuscular disorders
- 2 children who had died suddenly without a prior explanation
- 2 patients with early childhood psychosis illnesses
A 5% diagnostic yield sounds modest. In this context, it’s remarkable — these were cases that had already been analyzed multiple times by specialists.
One Patient’s Story
Kyra Benton started walking on her tiptoes at age 9. Specialists in New York had no answers. Boston Children’s Hospital had no answers. By 13, she’d had a tracheotomy and come to terms with never knowing what was wrong with her.
Then, a week before her 20th birthday, a researcher from the Manton Center called.
“She said, ‘Hi, we know it’s been about 15 years, but we have some news for you,’” Benton recalled.
The diagnosis: myofibrillar myopathy, a progressive genetic neuromuscular disorder that causes muscle fibers to break down.
Eleven years of uncertainty. Resolved by a model that doesn’t get tired of reading gene lists.
It’s a screening tool, not a replacement
Adam Rodman, an AI-in-medicine expert at Beth Israel Deaconess Medical Center, called the 5% diagnostic yield “truly meaningful” as a screening mechanism — particularly for clearing backlogs of unresolved cases. The model flags; the clinician decides.
It democratizes access to rare disease expertise
Not every hospital has a team of geneticists who can spend days on a single ambiguous genome. O3 doesn’t require a specialist on staff. It requires a clinician who knows how to prompt it well and review its outputs critically.
Even “rediscoveries” count
Seven of the 18 diagnoses were technically rediscoveries — cases where a diagnosis existed somewhere in the world but hadn’t been shared globally. Getting those patients on record matters enormously when new treatments emerge. You can’t fast-track a patient to a clinical trial if you don’t know their diagnosis.
The Honest Caveats
The research team was deliberate about not overselling this.
A diagnosis is a starting point, not a cure. Many rare diseases still have no treatment options. And LLMs are not consumer diagnostic tools — Chunhua Weng, a bioinformatics professor at Columbia, emphasized that “appropriate use of LLMs in diagnosis requires careful attention to trustworthiness.”
The workflow here worked because trained researchers controlled the inputs, reviewed the outputs, and applied clinical judgment throughout. The model was a powerful collaborator. It was not the doctor.
The Takeaway for AI Adopters
If you work in healthcare, research, or any field where the bottleneck is connecting existing knowledge to specific cases at scale — this is the use case to watch.
O3 didn’t discover new science. It connected existing science to patients who needed it, faster than any human team could manage alone. That’s a workflow problem solved by a language model. And that’s exactly the kind of unglamorous, high-impact application that tends to age well.
Kyra Benton, for her part, admitted she’s not exactly an AI enthusiast. But she acknowledged the obvious: “It can lead to massive breakthroughs that can really change people’s lives for the better.”
Sometimes the most compelling AI case study is the one where the technology just quietly did its job.
Comments (0) No comments yet
Want to join this discussion? Login or Register.
No comments yet. Be the first to share your thoughts!