Methods for variable selection and treatment effect estimation in nonrandomized studies with few outcome events and many confounders

OBJECTIVE: To evaluate and improve analytic strategies for nonrandomized studies with many confounders and few outcome events, including methods for variable selection for the PS in health care database analyses and methods for treatment effect estimation based on the PS. METHODS: In the first simul...

Full description

Bibliographic Details
Main Author: Franklin, Jessica M.
Corporate Author: Patient-Centered Outcomes Research Institute (U.S.)
Format: eBook
Language:English
Published: [Washington, D.C.] Patient-Centered Outcomes Research Institute (PCORI) 2018, [2018]
Series:Final research report
Online Access:
Collection: National Center for Biotechnology Information - Collection details see MPG.ReNa
Description
Summary:OBJECTIVE: To evaluate and improve analytic strategies for nonrandomized studies with many confounders and few outcome events, including methods for variable selection for the PS in health care database analyses and methods for treatment effect estimation based on the PS. METHODS: In the first simulation study, we compared the high-dimensional PS algorithm for variable selection with approaches that utilize direct adjustment for all potential confounders via regularized regression, including ridge regression and least absolute shrinkage and selection operator (lasso) regression. In the second simulation study, we compared a wide variety of propensity-based estimators of the marginal relative risk. In contrast to prior research that has focused on specific statistical methods in isolation of other analytic choices, we instead considered a method to be defined by the complete multistep process from PS modeling to final treatment effect estimation.
In the second set of simulations, regression adjustment for the PS and matching weights provided lower bias and mean squared error in the context of rare binary outcomes. CONCLUSIONS: Some automated analysis approaches can provide highly robust treatment effect estimates across a wide variety of scenarios. Therefore, their use in nonrandomized PCOR studies from administrative health care databases would be expected to improve treatment effect estimates and eventually result in better treatment decision making by patients and providers. LIMITATIONS AND SUBPOPULATION CONSIDERATIONS: All simulation results are specific to the data-generating mechanisms studied. Although we attempted to explore a wide variety of realistic scenarios, simulations scenarios built on other base data sets could result in different conclusions
PS model estimation methods considered included ordinary logistic regression, Bayesian logistic regression, lasso, and boosted regression trees. Methods for utilizing the PS included pair matching, full matching, decile strata, fine strata, regression adjustment using 1 or 2 nonlinear splines, inverse propensity weighting, and matching weights. In each study, we based simulations on 2 previously published pharmacoepidemiologic cohorts and used the plasmode simulation framework to create realistic simulated data sets with many potential confounders, and we evaluated performance of methods with respect to bias and mean squared error of the estimated effects. RESULTS: In the first set of simulations, high-dimensional PS approaches generally performed better than regularized regression approaches. However, simulations that included the variables selected by lasso regression in a regular PS model also performed well.
BACKGROUND: Nonrandomized studies of comparative effectiveness and safety evaluate treatments as used in routine care by diverse patient populations and are therefore critical for producing the information necessary to making patient-centered treatment decisions. Most nonrandomized studies based on electronic health care data use a propensity score (PS) to control hundreds of measured covariates and to estimate the causal effect of treatment. Even in large studies, high-dimensional confounder control can lead to problems in causal inference due to unstable estimation of the PS model or inappropriate use of observations with extreme PS values. However, in studies with few outcome events, each observed event is highly influential, and potential problems are exacerbated.
Physical Description:1 PDF file (64 pages) illustrations