Tree-based scan statistics (TreeScan) are a data-mining method that adjusts for multiple testing of correlated hypotheses when screening thousands of potential adverse events for signal identification. Simulation has demonstrated the promise of TreeScan with a propensity score (PS) matched cohort design. However, it is unclear which variables to include in a PS for applied signal identification studies to simultaneously adjust for confounding across potential outcomes. We selected 4 drug pairs with well understood safety profiles. For each pair, we evaluated 5 candidate PSs with different combinations of: predefined general covariates (comorbidity, frailty, utilization), empirically-selected (data driven) covariates, and covariates tailored to the drug pair. For each pair, statistical alerting patterns were similar with alternative PSs (≤11 alerts in 7,996 outcomes scanned). Including covariates tailored to exposure did not appreciably impact screening results. Including empirically-selected covariates can provide better proxy coverage for confounders but can also decrease power. Unlike tailored covariates, empirical and predefined general covariates can be applied "out of the box" for signal identification. The choice of PS depends on level of concern about residual confounding versus loss of power. Potential signals should be followed by pharmacoepidemiologic assessment where confounding control is tailored to the specific outcome(s) under investigation.
Am J Epidemiol
A General Propensity Score for Signal Identification using Tree-Based Scan Statistics.