Improving Real-world Studies by Curating Datasets and Tokenizing De-identified Patients


Durga Borkar, MD, MMCi Verana Health

At the 2021 meeting of the American Society of Retina Specialists in San Antonio, Texas, I presented data on behalf of a research team that sought to quantify and qualify the real-world patterns of patients with geographic atrophy (GA), an age-related disease that affects approximately 1 million patients in the United States and leads to irreversible blindness.1 Because there is no treatment for GA, many patients who are diagnosed with the disease are lost to follow-up (LTFU). Our team aimed to uncover which variables were risk factors for LTFU status.

In short, we found that LTFU status correlated with advanced age, male gender, Medicaid/no health insurance, distance from provider, and having vision that was worse than 20/40 at baseline. Note that 20/40 visual acuity is the threshold for performing certain activities including holding a driver’s license in many states. 

One way researchers can track real-world patient behaviors is through analysis of insurance claims data. However, private insurance claims databases do not capture the healthcare utilization patterns of patients who have no insurance or use government-payer policies (eg, Medicaid and Medicare). Because GA is an age-related disease, nearly all patients are eligible for Medicare. As such, for a research project such as this one, claims databases would have been insufficient. Given the strict inclusion/exclusion criteria in the study, the research team determined that relying on Centers for Medicare & Medicaid Services (CMS) data would yield too many patients ineligible due to incomplete records, and fail to provide a suitable population from which to draw reliable conclusions as a result. 

Curating Point-of-Care Data for GA Analysis

The American Academy of Ophthalmology’s IRIS® Registry (Intelligence in Research and Sight) is the world’s largest specialty clinical data registry, with more than 70 million unique patients and approximately 14,000 ophthalmologists participating at the time of my presentation. My team considered this dataset the most complete picture of real-world patient behavior. Furthermore, the IRIS Registry would facilitate speedy research thanks to its curated status, which is achieved by first ingesting the data, then harmonizing and tokenizing it, and lastly maintaining and protecting it using clinician oversight, natural language processing (NLP) and machine learning (ML).  

The value provided by Verana Health, the Academy’s data and technology partner, was fundamental to our research. By curating the IRIS Registry, Verana Health provided two distinct advantages: 

  • Tokenized patient identifiers. Some patients move between clinics for care, and a patient who shows up at an office in Florida during the winter but another office in Michigan during the summer could produce a double entry if the entries in the database aren’t flagged as duplicates.

    Patient privacy remains a paramount concern to Verana Health and the Academy, and all patient records in the IRIS Registry are de-identified. They are, however, tokenized. By tokenizing patients—that is, linking sensitive patient data to a unique cypher value assigned by a third party that allows curators to ensure a real-world database’s accuracy—Verana Health safeguards the IRIS Registry against duplicated data entries.
  • Rapid inclusion/exclusion of a retrospective study. Raw data are messy. Curated data aren’t. By working within a standardized dataset, our team quickly included or excluded patients from this retrospective study.

In our case, we started with approximately 230,000 patients in the IRIS Registry who had an ICD-10 code for GA in at least 1 eye from 2016 to 2017. We only included patients who were in a practice that had been contributing data to the IRIS Registry for at least 2 years. We excluded patients with a history of choroidal neovascularization prior to GA in the study eye, as well as those with missing demographic or visual acuity data and those with a history of particular retinal disorders. After applying these inclusion and exclusion criteria, the result was a cohort of approximately 58,000 patients in the LTFU cohort and 85,000 patients in the cohort with 2 or more years of follow up. 

With potential GA treatments on the horizon, any insight we can gain into which risk factors are linked to disease progression, including likelihood of routine follow-up, can help ensure that a treatment reaches the patients who need it. Relying on a curated, real-world database for these observations may be one of the strong underpinnings required for clinicians to make informed treatment decisions.

This study was sponsored by Apellis Pharmaceuticals in collaboration with Verana Health.

1. Friedman DS, O’Colmain BJ, Munoz B, et al. Prevalence of age-related macular degeneration in the United States. Arch Ophthalmol 2004;122:564–572.

Verana Health Logo

Let's Accelerate Research Together

To learn more about Verana Health, please fill out the information below and our team will follow up with you as soon as possible.