Artificial Intelligence is Helping Unlock Real-World Data to Provide a Comprehensive View of Prostate Cancer 


Verana Health

Prostate cancer has one of the highest five-year survival rates of any form of cancer. However, it’s estimated that 10-20% of patients develop more severe, castration-resistant forms of this disease, and roughly 6% progress to metastatic prostate cancer. To help identify signals and patterns of treatment in patients with increased severity of this disease, traditional approaches to real-world evidence (RWE) have been used, but have fallen short of capturing the entire patient population. That’s mainly due to the coding taxonomy in medical claims, which is inadequate in tracking disease progression, such as identifying metastasis, castrate resistance, or increased levels of prostate-specific antigen (PSA). To codify signals for disease progression, and gain a more accurate and comprehensive view of the patient population, it’s imperative to apply artificial intelligence (AI)-driven large language models (LLMs) to RWE.

Using AI to Mine for Gold in Clinical Notes

In the real-world, prostate cancer is often staged by a urologist at the point of diagnosis. Any further progression is documented not as a new TNM stage, but rather using other descriptors that are entered into the clinical notes of electronic health records (EHRs). For example, when the cancer metastasizes, the clinical note might not reference “M1.” Instead, it might state: “growing sites of metastasis on scan,” or “positive bone scan.” These alternative mentions of disease progression are strong clues for signaling metastatic risk.

To unlock these critical insights from unstructured clinical notes, Verana Health uses AI-powered LLMs and cloud computing capabilities that can:

  • Ingest massive volumes of EHR data
  • Identify patterns in EHR data
  • Generate predictive outcomes based on pattern recognition 

Access to this granular, patient-specific data is made possible through Verana Health’s exclusive partnership with the American Urological Association (AUA) Quality (AQUA) Registry. The AQUA Registry is the largest urology patient registry of its kind with a 10-year longitudinal database of more than 10 million de-identified patients who are cared for by more than 3,800 active clinicians. The ability to capture this data, in near-real-time, allows Verana Health to understand all aspects of individual patient experiences throughout their healthcare journeys.

Utilizing Deep Expertise to Develop Quality Insights

In order to extract meaningful insights from EHR data, Verana Health applies AI-powered LLMs and machine learning (ML) models that help analyze patterns of language in unstructured clinical notes and signal key milestones that occur during the patient journey. Most importantly, Verana Health’s team of clinical experts, which includes experienced urologists with deep expertise in data-driven research, is continually training and establishing rules for how the unstructured data is cataloged and categorized to make it useful in the real world.

Verana Health is unique among healthcare data and analytics providers in its ability to analyze this depth and breadth of RWD at scale. While some companies manually parse clinical notes for insights, and others have tried to automate the entire process, Verana Health is the only company of its kind to model patterns of language in this manner using the robust amounts of EHR data captured in the AQUA Registry.

Analyzing Patterns of Language to Expand the Patient Pool

After examining AQUA Registry data on more than 364,000 de-identified patients with prostate cancer and conducting a basic screen for those who experienced metastasis, using the documentation of “M1,” a total of 6,000 patients were discovered. However, when that analysis was expanded to include other phrases that also indicate metastasis in clinical notes, 29,000 patients were discovered, which was a five-fold increase. As mentioned, these phrases can provide critical clues that signal metastatic risk, or development of castration-resistant forms of the disease. 

Other key variables involved in the diagnosis and staging of prostate cancer are Gleason scores, which are based on biopsy samples and describe the aggressiveness of cancer cells, as well as PSA levels, which are measured through a lab analysis and used to track disease progression. Such measures are not captured in standard medical claims databases, and since they are not EHR structured fields, they are not recorded the same way by every clinician. As a result, attempts to extract meaningful insights from these unstructured datasets historically required labor-intensive, manual searches that were not particularly efficient and not at all scalable. With AI-driven models, Gleason scores are captured for 100% of patients in Verana Health’s prostate cancer dataset. Similarly, by analyzing patterns of diagnosis and patient PSA levels over time, it’s also possible to identify those with localized cancer and evaluate treatment patterns and outcomes.

Capturing a Complete View of the Patient Population

The next frontier of RWE is using AI and LLMs to unlock deeper insights from both structured and unstructured datasets. By training algorithms based on rules and nuanced interpretations developed and continually refined, as well as validated by practicing clinicians, Verana Health can deliver the highest quality of data and the best insights into the healthcare journey of patients with prostate cancer. 

To learn how Verana Health’s RWE solutions for prostate cancer can be used to unlock critical signals and spotlight important trends, click here.

Verana Health Logo

Let's Accelerate Research Together

To learn more about Verana Health, please fill out the information below and our team will follow up with you as soon as possible.