Use of electronic health record data and machine learning to identify candidates for HIV pre-exposure prophylaxis: a modelling study.

View Abstract

BACKGROUND

The limitations of existing HIV risk prediction tools are a barrier to implementation of pre-exposure prophylaxis (PrEP). We developed and validated an HIV prediction model to identify potential PrEP candidates in a large health-care system.

METHODS

Our study population was HIV-uninfected adult members of Kaiser Permanente Northern California, a large integrated health-care system, who were not yet using PrEP and had at least 2 years of previous health plan enrolment with at least one outpatient visit from Jan 1, 2007, to Dec 31, 2017. Using 81 electronic health record (EHR) variables, we applied least absolute shrinkage and selection operator (LASSO) regression to predict incident HIV diagnosis within 3 years on a subset of patients who entered the cohort in 2007-14 (development dataset), assessing ten-fold cross-validated area under the receiver operating characteristic curve (AUC) and 95% CIs. We compared the full model to simpler models including only men who have sex with men (MSM) status and sexually transmitted infection (STI) positivity, testing, and treatment. Models were validated prospectively with data from an independent set of patients who entered the cohort in 2015-17. We computed predicted probabilities of incident HIV diagnosis within 3 years (risk scores), categorised as low risk (<0·05%), moderate risk (0·05% to <0·20%), high risk (0·20% to <1·0%), and very high risk (≥1·0%), for all patients in the validation dataset.

FINDINGS

Of 3 750 664 patients in 2007-17 (3 143 963 in the development dataset and 606 701 in the validation dataset), there were 784 incident HIV cases within 3 years of baseline. The LASSO procedure retained 44 predictors in the full model, with an AUC of 0·86 (95% CI 0·85-0·87) for incident HIV cases in 2007-14. Model performance remained high in the validation dataset (AUC 0·84, 0·80-0·89). The full model outperformed simpler models including only MSM status and STI positivity. For the full model, flagging 13 463 (2·2%) patients with high or very high HIV risk scores in the validation dataset identified 32 (38·6%) of the 83 incident HIV cases, including 32 (46·4%) of 69 male cases and none of the 14 female cases. The full model had equivalent sensitivity by race whereas simpler models identified fewer black than white HIV cases.

INTERPRETATION

Prediction models using EHR data can identify patients at high risk of HIV acquisition who could benefit from PrEP. Future studies should optimise EHR-based HIV risk prediction tools and evaluate their effect on prescription of PrEP.

FUNDING

Kaiser Permanente Community Benefit Research Program and the US National Institutes of Health.

Abbreviation
Lancet HIV
Publication Date
2019-07-05
Pubmed ID
31285183
Medium
Print-Electronic
Full Title
Use of electronic health record data and machine learning to identify candidates for HIV pre-exposure prophylaxis: a modelling study.
Authors
Marcus JL, Hurley LB, Krakower DS, Alexeeff S, Silverberg MJ, Volk JE