Year of Publication
Master of Public Health (M.P.H.)
Dr. Emily Slade
Dr. Heather Bush
Dr. David Fardo
Improper treatment of missing data can lead to biased or invalid results. If the data are missing at random (MAR), multiple imputation by chained equations (MICE) is one method utilized to reduce bias. When implementing MICE, the imputation model must be compatible with the final analysis model. We aim to show how to include interaction terms in the imputation model to ensure valid results are obtained. Data were simulated for one continuous outcome originating from two binary predictor variables and their interaction. Missingness was imposed via a MAR mechanism. To handle the missing data, four methods were performed: complete records analysis (CRA) and three variations of MICE, each with different imputation models that vary in their inclusion of interaction effects. We also investigated two different methods for specifying these imputation models in R using different arguments in the mice() function. We utilized a final analysis model consisting of linear regression of the outcome on both main effects of the predictors and their interaction. The analyses performed with MICE including all two-way interactions had the least biased estimates and appropriate coverages. CRA often led to wide confidence intervals which in turn yielded less efficiency. Utilization of the MICE package in R is not entirely intuitive and few resources exist online to assist R users needing to implement MICE in data with interaction effects. There are caveats that must be included when coding for the imputation procedure, and misspecification can lead to inappropriate results.
Wilson, Nathaniel, "Methods for Multiple Imputation by Chained Equations Accounting for Missingness: A Simulation Study" (2019). Theses and Dissertations--Public Health (M.P.H. & Dr.P.H.). 252.
Available for download on Saturday, July 25, 2020