Rob Bart remembers what it felt like, the moment of discovery. As an intern at Duke University Medical Center in the ’90s, he’d sometimes be tasked with poring through a patient’s medical history to uncover the cause of their latest hospitalization. Back then, the stacks of paper records could tower 18 inches tall.
“I can remember that needle in the haystack feeling,” said Bart, now chief medical information officer at the University of Pittsburgh Medical Center, “when you found that one thing in the medical record that helps us figure this out.”
Today, he replays those memories when he hears his colleagues talking about the promise of large language models to summarize medical records. Electronic health records have led to even larger haystacks that stymie needle-hunting clinicians. But despite the capabilities of models like OpenAI’s GPT-4, it’s so far unclear whether they’re ready for the high stakes of clinical summarization, when a single missing word could mean the difference in a diagnosis.
This article is exclusive to STAT+ subscribers
Unlock this article — plus in-depth analysis, newsletters, premium events, and networking platform access.
Already have an account? Log in
Already have an account? Log in
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect