Project

Measuring Patient Data Density in EHRs to Understand Bias

Active

Developing structured and unstructured data density metrics in electronic health records to assess bias in cohort construction, with special attention to patients with limited English proficiency.

Jan 1, 2024

Patients vary widely in how much data their electronic health records contain — sicker patients tend to accumulate more, and existing comorbidity indices like the Charlson and Elixhauser were designed to measure mortality risk rather than control for this documentation density itself. With Emily Pfaff at the NC TraCS Institute, this project develops a new covariate to give researchers an additional tool for adjusting analyses for uneven EHR data.

A first manuscript is under review at the Journal of the American Medical Informatics Association. It introduces the EHR Density Index (EDI), which pairs utilization clusters from a Gaussian mixture model with within-cluster residuals across four OMOP domains — conditions, drugs, measurements, and procedures — to characterize how a patient’s documentation deviates from others with similar care patterns. The index was developed and validated on 24,987 UNC Health patients.

Related Publications

Re-engineering a Machine Learning Phenotype to Adapt to the Changing COVID-19 Landscape: A Machine Learning Modelling Study from the N3C and RECOVER Consortia

The Lancet Digital Health

Miles Crosskey, Tomas McIntee, Sandy Preiss, Daniel Brannock, Yun Jae Yoo, Emily Hadley, Frank Blancero, Rob Chew, Johanna Loomba, Abhishek Bhatia, Christopher G. Chute, Melissa Haendel, Richard Moffitt, Emily R. Pfaff, N3C Consortium, the RECOVER EHR Cohort

Background In 2021, we used the National COVID Cohort Collaborative (N3C) as part of the NIH RECOVER Initiative to develop a machine learning (ML) pipeline to identify patients …

Jan 2025 · PDF · Project

Identifying Who Has Long COVID in the USA: A Machine Learning Approach Using N3C Data

The Lancet Digital Health

Emily R. Pfaff, Andrew T. Girvin, Tellen D. Bennett, Abhishek Bhatia, Ian M. Brooks, Rachel R. Deer, Jonathan P. Dekermanjian, Sarah Elizabeth Jolley, Michael G. Kahn, Kristin Kostka, Julie A. McMurry, Richard Moffitt, Anita Walden, Christopher G. Chute, Melissa A. Haendel, Carolyn Bramante, David Dorr, Michele Morris, Ann M. Parker, Hythem Sidky, Ken Gersing, Stephanie Hong, Emily Niehaus

Jan 2022 · PDF · Project

Regulatory Sandboxes: A Cure for mHealth Pilotitis?

Journal of Medical Internet Research

Abhishek Bhatia, Rahul Matthan, Tarun Khanna, Satchit Balsari

Mobile health (mHealth) and related digital health interventions in the past decade have not always scaled globally as anticipated earlier despite large investments by governments …

Jan 2020 · PDF · Project