Year of Publication



Public Health

Date Available


Degree Name

Master of Public Health (M.P.H.)

Committee Chair

Dr. Emily Slade

Committee Member

Dr. Heather Bush

Committee Member

Dr. David Fardo


Improper treatment of missing data can lead to biased or invalid results. If the data are missing at random (MAR), multiple imputation by chained equations (MICE) is one method utilized to reduce bias. When implementing MICE, the imputation model must be compatible with the final analysis model. We aim to show how to include interaction terms in the imputation model to ensure valid results are obtained. Data were simulated for one continuous outcome originating from two binary predictor variables and their interaction. Missingness was imposed via a MAR mechanism. To handle the missing data, four methods were performed: complete records analysis (CRA) and three variations of MICE, each with different imputation models that vary in their inclusion of interaction effects. We also investigated two different methods for specifying these imputation models in R using different arguments in the mice() function. We utilized a final analysis model consisting of linear regression of the outcome on both main effects of the predictors and their interaction. The analyses performed with MICE including all two-way interactions had the least biased estimates and appropriate coverages. CRA often led to wide confidence intervals which in turn yielded less efficiency. Utilization of the MICE package in R is not entirely intuitive and few resources exist online to assist R users needing to implement MICE in data with interaction effects. There are caveats that must be included when coding for the imputation procedure, and misspecification can lead to inappropriate results.

Included in

Public Health Commons