Super learning is an ensemble machine learning approach used increasingly as an alternative to classical prediction techniques. When implementing super learning, however, not tuning the hyperparameters of the algorithms in it may adversely affect the performance of the super learner.
In this case study, we used data from a Canadian electronic prescribing system to predict when primary care physicians prescribed antidepressants for indications other than depression. The analysis included 73,576 antidepressant prescriptions and 373 candidate predictors. We derived two super learners: one using tuned hyperparameter values for each machine learning algorithm identified through an iterative grid search procedure and the other using the default values. We compared the performance of the tuned super learner to that of the super learner using default values ("untuned") and a carefully constructed logistic regression model from a previous analysis.
The tuned super learner had a scaled Brier score (R) of 0.322 (95% CI 0.267-0.362). In comparison, the untuned super learner had a scaled Brier score of 0.309 (95% CI 0.256-0.353), corresponding to an efficiency loss of 4% (relative efficiency 0.96; 95% CI 0.93-0.99). The previously-derived logistic regression model had a scaled Brier score of 0.307 (95% CI 0.245-0.360), corresponding to an efficiency loss of 5% relative to the tuned super learner (relative efficiency 0.95; 95% CI 0.88-1.01).
In this case study, hyperparameter tuning produced a super learner that performed slightly better than an untuned super learner. Tuning the hyperparameters of individual algorithms in a super learner may help optimize performance.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.