Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research.

Purpose

Sharing of detailed individual-level data continues to pose challenges in multi-center studies. This issue can be addressed in part by using analytic methods that require only summary-level information to perform the desired multivariable-adjusted analysis. We examined the feasibility and empirical validity of 1) conducting multivariable-adjusted distributed linear regression and 2) combining distributed linear regression with propensity scores, in a large distributed data network.

Patients and methods

We compared percent total weight loss 1-year postsurgery between Roux-en-Y gastric bypass and sleeve gastrectomy procedure among 43,110 patients from 36 health systems in the National Patient-Centered Clinical Research Network. We adjusted for baseline demographic and clinical variables as individual covariates, deciles of propensity scores, or both, in three separate outcome regression models. We used distributed linear regression, a method that requires only summary-level information (specifically, sums of squares and cross products matrix) from sites, to fit the three ordinary least squares linear regression models. A comparison set of analyses that used pooled deidentified individual-level data from sites served as the reference.

Results

Distributed linear regression produced results identical to those from the corresponding pooled individual-level data analysis for all variables in all three models. The maximum numerical difference in the parameter estimate or standard error for all the variables was 3×10 across three models.

Conclusion

Distributed linear regression analysis is a feasible and valid analytic method in multicenter studies for one-time continuous outcomes. Combining distributed regression with propensity scores via modeling offers more privacy protection and analytic flexibility.

Investigators

Sengwee Darren Toh

Abbreviation

Clin Epidemiol

Publication Date

2018-11-27

Volume

Page Numbers

1773-1786

Pubmed ID

30568510

Medium

Electronic-eCollection

Full Title

Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research.

Authors

Toh S, Wellman R, Coley RY, Horgan C, Sturtevant J, Moyneur E, Janning C, Pardee R, Coleman KJ, Arterburn D, McTigue K, Anau J, Cook AJ