Distributed data networks enable large-scale epidemiologic studies but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using two empirical examples within a three-site distributed data network, we tested combinations of three aggregate-level data-sharing approaches (risk-set, summary-table, effect-estimate), four confounding adjustment methods (matching, stratification, inverse probability weighting, match weighting), and two summary scores (propensity score, disease risk score) for binary and time-to-event outcomes. We assessed the performance of these data-sharing and adjustment method combinations by comparing their results against the results from the corresponding pooled individual-level data analysis (reference). For both outcome types, the method combinations examined yielded identical or comparable results to the reference in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method, e.g., risk-set data sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across adjustment methods examined, risk-set data sharing generally performed better while summary-table and effect-estimate data sharing more often produced discrepancies in settings of rare outcome and small sample size. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing individual-level data.
Am. J. Epidemiol.
Privacy-Protecting Analytical Methods Using Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks.