A Guide to Leveraging Real-World Data and Artificial Intelligence for Real-World Evidence Generation
Real-world data (RWD) and artificial intelligence (AI) have emerged as powerful tools in the healthcare industry, revolutionizing the way real-world evidence (RWE) is generated and decisions are made across the drug and device development lifecycle. This guide aims to provide life sciences companies with insights needed to effectively harness RWD and AI to leverage more expansive data sources and generate RWE that supports critical business decisions.
In the dynamic landscape of healthcare, stakeholders across the life sciences industry, from clinical development, to health economics and outcomes research (HEOR), to commercialization, are increasingly recognizing the value of RWE derived from RWD using AI.
AI techniques, such as machine learning (ML) and natural language processing (NLP), have transformed our ability to leverage more complex – and more valuable – data. As the demand for evidence-based decision-making grows, it is paramount that each stakeholder group within the healthcare and life sciences ecosystem understand and fully prepare to harness the wealth of data that is available.
Making Sense of Unstructured Electronic Health Record Data
Due to its structured nature, medical claims have been the de facto source of RWD for the industry. Unfortunately, claims data is inherently limited in the face of the complex problems organizations are trying to solve because it only captures care utilization and not treatment outcomes, details of diagnostic evaluations or the patient experience. On the other hand, electronic health records (EHRs) contain a veritable wellspring of information that is vital for such activity, with the caveat that much of it can’t be used in its raw, unstructured format.
With recent advances in technology and data processing capabilities, EHR and claims datasets can be linked to help offset the limitations of each as individual sources to allow researchers to see previously undetected patterns or connections. Linking EHR and claims data can help life sciences companies stratify subsets of patients based on treatment outcomes, evaluate market performance and product safety, and better understand how clinicians are prescribing a drug and the treatment patterns across multiple therapeutic areas.

However, tapping into the value of unstructured EHR data is complex. It requires data cleansing, harmonization, curation, and the ability to clinically interpret text-based documentation. Doing so at scale requires experience with AI, which is Verana Health’s specialty. Verana Health's VeraQⓇ population health data engine uses clinician-guided ML and NLP to surface relevant clinical notes, enabling comprehensive lists of relevant key phrases or terms that can improve cohort expansion, increase the velocity of algorithm development, and unlock scalable deployments on an entire cohort’s data. This helps make sense of data by using supervised or rules-based approaches on the text-based notes of more than 90 million de-identified patients.
For example, after examining data on more than 364,000 de-identified patients with prostate cancer and conducting an initial assessment using explicit staging documentation of “M1” to identify those who developed metastasis, a total of 6,000 patients were discovered. However, when that analysis was expanded to include other key phrases that also indicated metastasis in clinical notes, 29,000 patients were discovered, which was a five-fold increase. Innovative ML and NLP techniques enable contextualization of important inflection points throughout the patient journey even when there is heterogeneity with how these clinical characteristics or outcomes are characterized.
Each stakeholder group across the drug development lifecycle has distinct objectives and priorities when it comes to utilizing RWE.
For example, for a clinical development team, developing the protocol and determining patient cohort criteria for a clinical study is a critical first step. Organizations will need to define the specific patient populations for their study as well as consider disease states, treatment histories, and other relevant characteristics. However, it's important to ensure that the inclusion and exclusion criteria are well-suited for the study. It needs to be focused enough on the pertinent patient populations while also being appropriately inclusive of a population to meet recruitment targets on time. When paired with clinical guidance, RWD can provide valuable insights that help refine the inclusion and exclusion criteria to best meet enrollment and study goals as well as provide a better understanding of what the current patient population looks like.
It’s also important to assess whether RWE insights are needed for a single period or require ongoing data collection and analysis. For example, to track market share over time, ongoing refreshed data is needed. It is important to consider the necessary frequency of data refreshes. Do you need real-time updates, quarterly summaries, or annual reports? The frequency should align with your decision-making timelines and the nature of the healthcare interventions being studied.

Consider what specific outcomes and details are crucial for analysis. This could include key patient outcomes such as treatment effectiveness, adverse events or quality of life metrics. Additionally, consider the level of detail required, such as:
- Brand level detail - market penetration, comparative effectiveness across different brands, brand switching patterns
- Patient level detail - adherence, persistence, response to therapy
- Practice level detail - geography, research experience, clinician subspecialty
By thoroughly assessing these needs, you can ensure that your approach to RWD is both strategic and efficient, leading to actionable insights that support decision-making across the healthcare landscape. Tailoring your RWD solutions to these specific needs will enhance the relevance and impact of your RWE efforts.
Different organizations have different needs for their RWD.
- Prioritizes quick access to RWD for assessing market dynamics of drugs on the market
- Key factors: Timeliness of data, market trends, patient demographics, prescribing patterns, coverage of the U.S. market
HEOR:
- Prioritizes clinical granularity and longitudinal follow-up to track patient outcomes over time
- Key factors: Clinical outcomes, disease subtype data, safety events, biomarkers, genetic tests
- Prioritizes access to RWD for protocol optimization and trial design or patient identification
- Key factors: Treatment patterns, disease prevalence, patient characteristics, practice location and research experience
As entities explore the integration of RWD and AI into their operations, it's essential to carefully evaluate the data sources and the AI capabilities available.
Assessing the Data Source and Proper Data Handling
The quality, recency, diversity, and relevance of the RWD sources play a crucial role in the reliability and validity of the insights derived from the analysis. It is critical to have access to high-quality data sources to ensure comprehensive coverage of relevant patient populations and healthcare settings.

High-quality data provides:
- Depth: Quality data provides the necessary level of detail and comprehensiveness.
- Validity: Data must be accurate, representative and reliable for the purposes that you’re researching.
- Traceability: Data direct from the source plays an important role in traceability, since the more you can link the output to the source information, the higher reliance and confidence you can have in the outcomes that are generated.
- Speed: The insights are only as good as the recency of the data.The more recent the information, the more timely companies can make decisions based on RWE.
Quality data should also be handled with the utmost care and expertise in data security, compliance with regulatory requirements, and protocols to ensure its integrity and privacy. Applying AI to unstructured clinical notes from EHR data may be required to derive meaningful insights beyond what can be found in structured diagnosis or treatment codes alone.
Verana Health has exclusive partnerships with the American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight) and the American Urological Association (AUA) Quality (AQUA) Registry. The IRIS Registry comprises nearly 80-million de-identified patients from 15,000 contributing clinicians over 11 years. The AQUA Registry comprises 12 million de-identified patients from more than 3,800 clinicians over 10 years. The data from both registries come directly from clinicians who care for patients in real-world settings and are leveraged in Verana Health’s datasets.
Evaluating AI, Data Science, and Clinical Capabilities
The quality of the data is only part of the equation. If the appropriate AI models or analysis aren’t being used, the insights generated will not reflect the quality of the data.
A variety of AI techniques and models exist, such as NLP, large language models, rules-based algorithms and more. RWE projects require a strong understanding of AI algorithm and model development, as well as data analytics techniques to ensure the relevance, efficiency, robustness and scalability of the analysis conducted on RWD. Moreover, data science expertise, including proficiency in implementing and developing AI models, is necessary in order to derive meaningful insights that maintain the integrity of complex healthcare data. This also requires continuous assessment to maintain and monitor deployed techniques as longitudinal clinical data frequently has new data being added or refreshed.
Additionally, subject matter expertise, namely clinical domain knowledge, can provide valuable, clinically-sound insights and contextualize findings in the context of healthcare delivery and patient outcomes.
Choosing the Correct Data Delivery
For many organizations, data-as-a-service (DaaS) provides an easy, convenient, and secure method of accessing high-quality, de-identified clinical RWD for research. Additionally, data delivery can include a powerful self-service dashboard that visualizes the data in summary. One example is Verana Health’s Qdata Anti-VEGF Market Tracker which actively tracks real-world therapy usage, rolling market share and other KPIs of interest. The data is refreshed monthly. Such solutions empower organizations to leverage RWD in day-to-day decision-making.
Verana Health offers DaaS for its Qdata® modules. These modules start with structured and unstructured EHR data on more than 90 million de-identified patients.
The data are harmonized, tokenized, and can be linked with other sources, such as claims and images in VeraQ. Finally, the data are curated with clinical expertise and guidance, complemented by NLP and ML. The result is Qdata: research-ready, fit-for-purpose data modules that enable observational studies on a multitude of conditions across ophthalmology and urology.
On the other hand, some organizations may not have the in-house resources required to manage such a large amount of data. In these cases, Verana Health can offer customized projects, in which the team analyzes Qdata on the organization’s behalf and delivers insights tailored to unique requirements and objectives.
In an era where data-driven decision-making is crucial, leveraging RWD and AI to generate RWE empowers organizations to have a stronger handle on how their therapies are performing in the real world, as well as against others in the market. From optimizing clinical trial designs to informing market strategies and improving patient outcomes, the integration of RWD and AI offers unprecedented opportunities for stakeholders across the industry.

As technology continues to advance and new data sources emerge, the importance of strategic partnerships cannot be overstated. By embracing the principles outlined in this guide and staying at the forefront of RWE generation, organizations can lead the way in shaping the future of healthcare and delivering value to patients, clinicians, and healthcare systems alike. Together, we can harness the power of RWD and AI to drive positive outcomes and improve the quality of care for all.
