Measuring Patient Data Density in EHRs to Understand Bias

Developing the EHR Density Index (EDI), a covariate that quantifies how much a patient's documentation deviates from clinically similar patients, so EHR studies can adjust for documentation-density bias. Developed and validated on ~25,000 UNC Health patients.

Patients vary widely in how much data their electronic health records contain — sicker patients tend to accumulate more, and existing comorbidity indices like the Charlson and Elixhauser were designed to measure mortality risk rather than control for this documentation density itself. With Emily Pfaff at the NC TraCS Institute, this project develops a new covariate to give researchers an additional tool for adjusting analyses for uneven EHR data.

A first manuscript is under review at the Journal of the American Medical Informatics Association. It introduces the EHR Density Index (EDI), which pairs utilization clusters from a Gaussian mixture model with within-cluster residuals across four OMOP domains — conditions, drugs, measurements, and procedures — to characterize how a patient’s documentation deviates from others with similar care patterns. The index was developed and validated on ~25,000 UNC Health patients.

Interested in this work? Get in touch