3 Questions to Ask Before Leveraging Artificial Intelligence to Generate Real-World Evidence


Lawrence Whittle, President

Earlier this month, I had the pleasure of hosting an Endpoints Webinar, which ChatGPT described as the “Super Bowl of artificial intelligence (AI), in the context of real-world evidence (RWE), and why it’s so compelling.” Joining me was my esteemed colleague, Aracelis Torres, PhD, MPH, our SVP of Data & Science; and Vera Mucaj, the chief scientific officer at Datavant, our tokenization partner. The goal of the webinar was to equip life sciences companies with 3 important questions to ask before engaging in projects that apply AI to real-world data (RWD) to generate RWE.

(Pictured top left to right: Vera Mucaj, Datavant; and Kari Abitbol, Endpoints News; Pictured bottom left to right: Aracelis Torres and Lawrence Whittle, Verana Health)

Before delving into these 3 vital questions, I’d first like to highlight the opportunities for RWE across the drug development lifecycle. RWE is instrumental in helping to:

  • Inform study design and identify potential clinical trial participants
  • Accelerate the development of therapies by characterizing patient burden and tracking disease progression
  • Understand real-world treatment patterns, outcomes, and effectiveness
  • Track market share

Understanding AI and outcomes

AI can be overwhelming, especially with the various terminologies and constant news surrounding the good, the bad and the indifferent of this technology. However, AI-advanced techniques such as machine learning (ML) and natural language processing (NLP), are integral in leveraging RWD to develop RWE that can help inform research and improve patients’ lives. It is extremely important to understand both RWD, and the AI models being used to analyze that data, to really understand outcomes. In other words, near perfect data and a near perfect AI model are necessary, in order to receive near perfect outcomes. 

Defining RWD 

The way we define RWD is data on patients that is collected from real life interactions. This includes structured data from electronic health records (EHRs) and claims data. It can also come from unstructured data found within the clinician notes of EHRs. It’s estimated that 80% of healthcare data is unstructured

Historically, structured data have primarily been used for RWE generation because of the ease of use, and to help answer the question of “what” (e.g., what medications were used). However, the unstructured data found in clinician notes tells the story of the “why” (e.g., why a particular medication was prescribed, or why a particular diagnosis was determined). Unstructured data can also be critical in measuring disease progression (e.g., imaging data that shows a tumor or lesion growth over time). 

The Questions That Can Help Guide Your Next RWD Project

This leads us to 3 important questions life sciences companies should ask data partners, before launching their next RWD project.

  • Question #1: Is the data high quality? Make sure the data meets the following requirements:
    • Depth – includes a level of detail and comprehensiveness. One example is EHR data, which is captured at the point of clinical care and provides additional context. 
    • Validity – the degree to which the data is accurate and reliable for the purposes that you’re researching. For example, Verana Health has an exclusive partnership with the American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight), which comprises more than 80-million de-identified patients from 15,000 contributing clinicians over 11 years.   
    • Direct from the Source – it’s critical to understand where the data is coming from and the processing steps it undergoes (e.g., IRIS Registry data comes directly from clinicians who cared for the patients). This is important in understanding the traceability around the data and if it’s prepared to undergo an audit. The more you can link the output to the source information, the higher reliance and confidence you can have in the outcomes that are generated.
    • Speed – the insights are only as good as the recency of the data. At Verana Health, we’re working with data that has less than a month recency from the point at which it was entered into the EHR system to the point at which it makes its way into a dataset to inform decision-making. The more recent information we can place into the hands of clinicians and life sciences companies, the more timelier actions can be made on insights.
  • Question #2: What is your process for using AI?

Even if you begin with quality data, you can end with imperfect results, if you’re not utilizing the appropriate model or analysis to generate those insights. Firstly, it’s important to select the right model by understanding which AI technique to use (e.g., ML model, rules-based algorithm, or large language model). Secondly, it’s important to understand how the model was developed and tested, and whether the training data is recent. The landscape in healthcare moves at a rapid rate, so if the data is even a year old, it likely will not be reflective of some of the current treatment patterns or available therapies on the market. Thirdly, determine if the data is proprietary and whether the data used and included in the model will be made available externally. Lasty, in the world of AI models, privacy is important. It’s important to have the highest standards when managing healthcare data. We partner with Datavant, which creates irreversible, site-specific encrypted tokens for each patient record. These tokens preserve patient privacy, while ensuring datasets remain linked during the process. It’s often ideal to link multiple data sources to connect the “what” and the “why” in a single study.

  • Question #3: Do you have deep subject matter expertise?

Having the appropriate experts on staff, who specialize in the specific disease area that you’re trying to tackle, is crucial. You also want to understand the roles of these experts. For example, at Verana Health, we have a team of physicians who specialize in ophthalmology, neurology and urology. This team is involved throughout the entire data curation process to ensure that each step has clinical validation. You want to ensure these experts are able to speak to what’s happening in the real-world in terms of clinical care, which should be reflected in the output generated. Involving the clinical voice at every step will help ensure that the utilization of the model isn’t occurring in a silo and that you’re coordinating with key individuals who are bringing to life the output that’s entered into EHRs. In addition to medical experts, you also want to ensure any technical teams you’re working with have experience in implementing and developing AI, as well as overseeing maintenance and monitoring of your deployed models.

To learn more about Verana Health, and to view case studies and use cases that highlight the power and impact of high-quality RWD, watch the entire webinar on demand here.

Verana Health Logo

Let's Accelerate Research Together

To learn more about Verana Health, please fill out the information below and our team will follow up with you as soon as possible.