Author ORCID Identifier

https://orcid.org/0000-0003-0969-7679

Date Available

5-23-2024

Year of Publication

2022

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department/School/Program

Statistics

First Advisor

Dr. Katherine Thompson

Abstract

When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, model fit statistics, and residuals. First, we evaluate the “subtraction adjustment method”. This is used for variables like pre and post blood glucose levels. In this method, one adjusts for the pre value by subtracting it from the post value, with this difference being the new outcome. Next, we evaluate the “division adjustment method”. This occurs when researchers adjust an outcome by dividing it by a variable such as body mass, with the result being the new outcome. The third method investigated is the “residual adjustment method”, where a model is fit first on the adjustment variable to obtain those residuals. Then, those residuals are fit on the variable of interest. This has been used in human GWAS studies, sometimes called “two-stage regression analysis.” The final method we investigated is Nanostring’s method for normalizing genetic data. This is a multistep process used in their software to adjust for variability in their ma-
chines’ assay process. To test these methods, we performed simulations to evaluate how these methods perform with and without an interaction effect in the data. We found cases where the models created using these methods could not accurately estimate the model coefficients. In addition, they completely miss interaction effects, and contain the potential for masking significant associations between the outcome and variable of interest. Perhaps most troubling was how some of the methods were found to have lower power than simply fitting a linear regression model using adjustment variables as covariates. Finally, even when the models from the adjustment methods had comparable performance to the true model method, the model fit (measured by adjusted R2) was inferior to the true model method. Because of this, we advocate that variable adjustments be done via adding them as covariates, rather than using these methods to create adjusted outcomes.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2022.210

Share

COinS