What is the Role of Large Language Models in Real-World Evidence Generation?
Author:
Aracelis Torres, PhD, MPH
SVP of Data & Science
The healthcare industry is undergoing a transformative era, driven by the convergence of real-world data (RWD) and artificial intelligence (AI). Among the most groundbreaking innovations are large language models (LLMs), which are generative AI models that can analyze and curate large amounts of RWD – at an unprecedented scale and speed – to generate real-world evidence (RWE).
When combined with human expertise, large language models and real-world evidence have the power to unlock critical insights and provide a more comprehensive view of the patient journey
When combined with human expertise, large language models and real-world evidence have the power to unlock critical insights and provide a more comprehensive view of the patient journey, in turn, allowing life sciences companies and clinicians to tailor treatments and improve care.
Utilizing LLMs to Process RWD
LLMs learn skills by analyzing vast amounts of RWD, such as structured and unstructured electronic health record (EHR) data. Structured data consists of patient demographics, diagnosis codes, medication names and numeric lab results. This information is standardized and easily searchable by data analysis tools.
The other 80% of healthcare data is unstructured, such as symptoms, lifestyle factors, treatment decisions and images. This information primarily lives in free-form text that clinicians type into the notes section of EHRs, and is not as easily searchable, due to its non-standardized format.
However, once organized and de-identified by LLMs, using natural language processing (NLP) and machine learning (ML) techniques, unstructured data can reveal trends and patterns in patient care and disease progression to help identify key milestones or potential risk factors for certain conditions.
Turning RWD into Meaningful Insights
By carefully curating RWD to flag patterns of language consistent with certain clinical cues, LLMs have the ability to extract meaningful insights. But LLMs can’t do it alone; in addition to this methodology, clinical expertise in data-driven research is necessary to ensure AI models are continually being trained and validated as to how they catalog and categorize unstructured data to make it more easily analyzable.
In addition to utilizing secure and advanced technology, and possessing exclusive partnerships with leading medical societies to access RWD, Verana Health has physicians and data scientists on staff. These expert teams possess deep medical data analytics and RWE expertise to create novel AI models that search clinical notes within EHRs to make sense of the data.
To gain a better understanding of how AI and LLMs generate quality RWE when paired with expert teams, let’s examine prostate cancer. When assessing prostate cancer severity and risk of metastasis, not every clinician uses the well-known TNM staging system, especially if it is a follow-up patient visit. So, Verana Health developed ML models to surface strong indicators of metastatic risk, such as references to “growing sites of metastasis on scan” or “positive bone scan.”
By expanding from the explicit documentation of the TNM staging system to include more implicit types of evidence, Verana Health was able to capture approximately 5x more instances of metastasis. Additionally, Gleason score, PSA level and castration resistance are key variables that can be curated from unstructured clinical notes to study the patient journey at scale.
Utilizing LLMs to curate RWD and generate RWE can help researchers determine if patients are receiving the right treatments, based on disease progression. Key variables can also track a patient’s response to a prescribed therapy.

ML models also assist in identifying patients, when standardized coding is not always noted in the structured data of EHRs. For example, when solely using standard ICD-10 codes to identify patients with geographic atrophy, Verana Health was able to yield 330,000 patients. After implementing ML capabilities to tap into unstructured clinical notes, a significant undercount was uncovered, and an additional 476,000 patients were identified, significantly expanding the total cohort to more than 810,000 patients.

Key variables, such as lesion size, location and growth rate, and other criteria used to identify GA and chart its progression, can also be identified more efficiently in ophthalmic images using models compared to manual review. Verana Health has access to tens of thousands of images for more than 2,000 patients with GA. These images can be utilized to train AI models to identify disease progression at scale.
Capturing the Complete Patient Journey with RWE
By harnessing the power of LLMs, we can unlock deeper, more comprehensive insights into treatment outcomes and patient experiences to better understand the prevalence and progression of a variety of diseases.
Large language models and real-world data hold significant promise for advancing the generation of RWE in healthcare. By harnessing the power of LLMs, we can unlock deeper, more comprehensive insights into treatment outcomes and patient experiences to better understand the prevalence and progression of a variety of diseases.
To truly know if you’re capturing the complete patient population and journey, you need a quality RWD source (i.e., structured EHR data, unstructured EHR data, images, etc.) that closely represents the clinical interpretation of the patient’s condition and experience. Most off-the-shelf LLMs are trained on a vast amount of publicly available data. Verana Health combines its exclusive access to specialty data with vital subject matter expertise and robustly scales it through a secure technology platform. By training AI algorithms on expansive clinical datasets and incorporating nuanced interpretations that are developed and continually refined by practicing clinicians, you can view the most data and with the best insights into what’s really happening at each step of the patient journey.
To learn how Verana Health’s RWD and RWE solutions are tapping into unstructured data that’s easily scalable and accessible, visit: https://veranahealth.com/solutions/.
Let's Accelerate Research Together
To learn more about Verana Health, please fill out the information below and our team will follow up with you as soon as possible.