The Promise Is Real. So Is the Friction.

AI scribes are no longer a pilot program curiosity — they’re being deployed at scale across major health systems right now. A JAMA study published in April found that heavy users of these tools saved more than 30 minutes of work per day, one year after installation. That’s not nothing. For a burned-out clinician drowning in documentation, half an hour is a lifeline.
But ask Paul Boyer, a Kaiser Permanente psychotherapist in Oakland, and you get a different story. His system uses Abridge, one of the leading names in healthcare ambient AI. His verdict:
not super useful.
The software misses clinical nuance. It can’t read emotional tone. In mental health care, how a patient speaks often matters more than what they say — and that’s exactly where the AI falls flat. Boyer and his colleagues spend time correcting notes instead of saving time writing them.
That’s the gap between benchmark performance and real-world clinical value. It’s wide, and it varies enormously by specialty.
When “Good Enough” Isn’t Good Enough

Abridge monitors clinician edits, star ratings, and free-text feedback to improve note quality. That’s a reasonable feedback loop — but it assumes clinicians are actually catching the errors.
Safety researchers aren’t so sure they are.
Raj Ratwani, a human factors researcher at MedStar Health, puts the concern plainly: there is currently no federal safeguard vetting AI scribe software before it reaches clinicians. If a note contains a subtle omission — a missed medication, a mischaracterized symptom — and a busy doctor skims past it, that bad information enters the patient’s permanent record. Future clinicians inherit it. Decisions get made on it.
A VA study comparing 11 AI scribes against human-authored documentation found the software underperformed humans across all five simulated scenarios. The most alarming finding: critical information was being left out. The kind of information that shapes follow-up care.
The vendors weren’t named. For contractual reasons.
The Regulation Gap Is Getting Wider

Here’s where the story shifts from product critique to systemic risk.
The Office of the National Coordinator for Health IT — the body overseeing electronic health records — has proposed rules that would roll back two meaningful protections: user-centered design testing and AI transparency requirements.
User-centered design testing required developers to actually try their products on doctors and nurses before deployment. It was intended to prevent the kind of interface chaos Ratwani describes — medication lists so cluttered that a physician accidentally orders the wrong dose of Tylenol from 30 nearly identical options.
The proposed rules don’t eliminate the testing requirement outright. They just remove the obligation to report results to regulators. Companies still have to do the work. They just don’t have to show anyone.
That’s a distinction without much practical difference.
The Transparency Card Gets Pulled

The Biden administration introduced AI model cards — a simple click-through that let clinicians see what data trained the AI tools advising their care decisions. Adoption was low. The Trump administration used that as justification to scrap the requirement.
The American Hospital Association pushed back. The American College of Physicians pushed back. Even some EHR developers were divided — which, as one industry insider noted, is unusual for a trade group that normally aligns quickly.
The argument for keeping transparency tools isn’t sentimental. It’s practical. Clinicians need to know why an AI is making a recommendation before they act on it. A black box that says do this is not a clinical decision support tool. It’s a liability.
The Market Dynamics Underneath

Proponents of deregulation argue that fewer rules mean more competition and more innovation. The EHR market is highly consolidated — Epic and Oracle Health held over 70% of the hospital market as of 2022 — and the argument goes that regulatory burden keeps smaller players out.
That’s a real tension. Consolidation is a genuine problem. But removing usability and transparency requirements doesn’t automatically produce better competitors. It just removes the floor.
And into that floor-free market, a wave of AI tools is arriving — each needing its own evaluation, each carrying its own risk profile, most of them unvetted at the federal level.
What Clinicians Are Actually Living With

Boyer’s worry isn’t abstract. He can ignore the AI scribe for now. But he’s watching for the next move: management redesigning his schedule around the expected time savings, adding more patients, assuming the AI is handling the documentation load.
If that happens, he’d need to spend more time with patients and more time correcting AI errors. The efficiency gain evaporates. The workload compounds.
Kaiser Permanente says it doesn’t require clinicians to use AI. That’s a reasonable policy. But institutional pressure is rarely that simple, and the incentive structures around productivity are rarely that neutral.
The Takeaway for Anyone Watching This Space

Healthcare AI scribes are a legitimate category with real productivity upside — and real quality variance by specialty, vendor, and deployment context. The JAMA data is encouraging. The VA study is sobering. Both are true at the same time.
The deeper issue isn’t whether AI scribes work. Some do, in some contexts, for some clinicians. The issue is that the infrastructure for evaluating them — testing requirements, transparency standards, usability mandates — is being dismantled faster than the evidence base is being built.
Choosing smarter in this space means looking past the demo. Ask what the error rate looks like in production. Ask what the correction workflow costs clinicians in time. Ask what happens when the AI is wrong and no one catches it.
The tools are moving fast. The safeguards are moving backward. That gap is worth watching closely.
Comments (0) No comments yet
Want to join this discussion? Login or Register.
No comments yet. Be the first to share your thoughts!