Improving measurement of binary covariates in claims data: A simulation study.

View Abstract


When investigators have two claims-based definitions for a binary confounder, it is unclear whether to prefer the more sensitive or more specific definition. Our objective was to compare adjusting for the sensitive or specific definition alone vs two novel approaches combining both definitions: a "two-algorithm indicator" and a "two-algorithm restriction" approach.


Each simulated patient had a binary exposure, outcome, and confounder. We created two nested, misclassified versions of the confounder using validated heart failure definitions. The sensitive definition had a sensitivity/specificity of 0.98/0.83, while the specific definition had a sensitivity/specificity of 0.77/0.99. Patients were classified into 3 groups: group 0 did not meet either definition, group 1 met the sensitive but not specific definition, and group 2 met both. The two-algorithm indicator approach adjusted using indicators for groups 1 and 2, while the two-algorithm restriction approach excluded patients in group 1 and adjusted using an indicator for group 2. Adjusted exposure odds ratios (ORs) were estimated for each approach using logistic regression.


The crude OR was 1.33 (95% CI, 1.07-1.63). Adjusting for the specific or sensitive definitions resulted in ORs of 1.09 (95% CI, 0.87-1.35) and 1.14 (95% CI, 0.91-1.40). The two-algorithm indicator method returned an OR of 1.07 (95% CI, 0.86-1.33). The two-algorithm restriction approach returned an OR of 1.02 (95% CI, 0.79-1.29) but excluded 20% of the cohort.


The two-algorithm indicator approach may improve adjustment for claims-based confounders by returning a point estimate at least as unbiased as the better of the two component definitions.

Pharmacoepidemiol Drug Saf
Publication Date
Pubmed ID
Full Title
Improving measurement of binary covariates in claims data: A simulation study.
Connolly JG, Glynn RJ, Schneeweiss S, Gagne JJ