Many cancer registries do not capture recurrence; thus, outcome studies have often relied on time-intensive and costly manual chart reviews. Our goal was to build an effective and efficient method to reduce the numbers of chart reviews when identifying subsequent breast cancer (BC) using pathology and electronic health records. We evaluated our methods in an independent sample.
We developed methods for identifying subsequent BC (recurrence or second primary) using a cohort of 17,245 women diagnosed with early-stage BC from 2 health plans. We used a combination of information from pathology report reviews and an automated data algorithm to identify subsequent BC (for those lesions without pathologic confirmation). Test characteristics were determined for a developmental (N=175) and test (N=500) set.
Sensitivity and specificity of our hybrid approach were robust [96.7% (87.6%-99.4%) and 92.1% (85.1%-96.1%), respectively] in the developmental set. In the test set, the sensitivity, specificity, and negative predictive value were also high [96.9% (88.4%-99.5%), 92.4% (89.4%-94.6%), and 99.5% (98.0%-99.0%), respectively]. The positive predictive value was lower (65.6%, 55.2%-74.8%). Chart review was required for 10.9% of the 17,245 women; 2946 (17.0%) women developed subsequent BC over a 14-year period. The date of subsequent BC identified by the algorithm was concordant with full chart reviews.
We developed an efficient and effective hybrid approach that decreased the number of charts needed to be manually reviewed by approximately 90%, to determine subsequent BC occurrence and disease-free survival time.