What Elevating Quality in Real-world Data Means to Verana Health


Mike Mbagwu, MD, Senior Medical Director; Dominique Connolly, RN, NP, VP of Clinical Data Strategy; Aracelis Torres, PhD, MPH, SVP of Customer Solutions and Quantitative Sciences; Yana Nikitina, VP of Engineering; Amanda Lien, Senior Product Manager

As a digital health company that is elevating quality in real-world data (RWD), we are laser-focused on applying technology to generate actionable and meaningful insights. Done right, high quality de-identified RWD has the power to help accelerate medical innovations and improve the quality of care and quality of life for patients through a variety of applications across the healthcare ecosystem. 

High quality de-identified RWD has the power to help accelerate medical innovations and improve the quality of care and quality of life for patients through a variety of applications across the healthcare ecosystem.

What is quality RWD? 

Achieving quality curated RWD is a collaborative effort that should be as complete, accurate, and timely as possible for the use case at hand. The exclusive data entrusted to us by medical societies has been ingested from more than 70 electronic health record (EHR) systems, and is de-identified, normalized, harmonized, and curated—using data from both structured EHR fields and unstructured physician notes. Oftentimes, it is tokenized and linked with other data sources, such as claims and images. 

Data quality involves applying a clinically informed process at each step—from ingestion to curation to application, including development of Qdata  modules. We achieve this with a multidisciplinary team that includes clinicians, nurses, clinical informaticians, data scientists, epidemiologists, biostatisticians, engineers, and more who together decide how to curate and standardize the data while retaining its original clinical context. To unlock its full potential for stakeholders (patients, providers, regulators, and life sciences), the quality of the data being generated must remain front and center.

Structured vs. unstructured EHR data and data quality implications

Analysis of structured data types can often be performed fluidly using standard statistical or machine learning methods, since the data already exists in a fixed structure. However, the potential of EHR data to unlock new use cases lies in transforming the important insights that are found within the unstructured data, including clinical notes and other sources of unstructured text or images.  

An estimated 80% of healthcare data in an EHR is unstructured. Key information such as a patient’s symptoms and experience, physical exam findings, results of diagnostic tests, and clinical decision-making on assessment and plan are all found in clinical notes. Other important patient care information can also be found in unstructured text such as operative notes, radiology reports, pathology reports, and send-out test results, such as genetics. Transforming unstructured healthcare documentation into research-ready datasets at scale must be grounded in scientific principles to ensure data quality and the credibility of the evidence that is generated. 

To maximize actionable insights of the patient journey, it is critical to combine well-defined structured data and high-quality unstructured data in a consistent and reliable way. As a result, the key question becomes: How do we build a platform that helps transform unstructured data into a research-ready resource at scale? 

How Verana Health drives data quality
At Verana Health, measurement of data quality is built into each step of the process from the data integration of more than 70 EHR systems through each step of data transformation, including the development of use-case specific datasets, our high-quality Qdata modules. By building quality considerations relevant for each step of the data pipeline, at a systemic level, we are able to continuously measure quality in a routine way across the transformation process. This transparent methodology builds trust in RWD and enables scaling as more clinicians contribute their data to our RWD network—currently more than 20,000 healthcare providers (HCPs)—via the qualified clinical data registries we manage for specialty medical societies. To this end, Verana Health has created a systematic process to generate quality reporting dashboards for HCPs, which are powered by our population health data engine, called VeraQ

The process to achieve data quality
Verana Health has constructed a scalable and clinically based data quality assessment process that includes clinician input and artificial intelligence—namely natural language processing (NLP) and machine learning (ML)—to bring both structure and meaning to RWD at scale. 

This process helps to generate data quality reports for each step of the curation pipeline and each data refresh at a practice, provider, and EHR level.

Our assessment process is consistent, but flexible enough to be tailored to different use cases and data models. It is designed to quantitatively score provider-, practice-, and EHR-level data on a matrix of six data quality dimensions across three classes: technical, clinical, and scientific.

 Data quality assessment

 When Verana Health considers data quality, we emphasize six key characteristics:

  1. Completeness: Does the data encompass the entire clinical process?
  2. Accuracy: Does the data accurately reflect the patient chart/reality?
  3. Traceability: Does the data contain provenance back to the source?
  4. Consistency: Does the data maintain integrity across structures, time, releases?
  5. Generalizability: Does the data represent a minimally biased sample?
  6. Timeliness: Does the data reflect recent clinical practice patterns?

Looking at these parameters helps us evaluate the quality of our technical processes, such as ingestion and harmonization, and understand the quality of our curation. The information is used to make the hard decisions of whether or not to include key data elements or variables in a cohort, where a fully manual review versus an AI-enabled approach is appropriate, and how to structure data to guide critical decision-making in the research and care settings.

Data quality informing quality of care and life

Verana Health is fundamentally a medically informed data and technology organization. We have clinicians involved in the process of helping to create quality data and we leverage some of the industry’s best tools and talent on the operational side. Our ultimate purpose is ongoing, rigorous, dynamic evaluation and analysis to achieve the most complete and highest quality data possible. Because when you have quality data, it informs trusted research that ultimately helps improve the quality of care and life for patients.

Verana Health Logo

Let's Accelerate Research Together

To learn more about Verana Health, please fill out the information below and our team will follow up with you as soon as possible.