Date Available

12-17-2022

Year of Publication

2020

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department/School/Program

Statistics

First Advisor

Dr. Katherine Thompson

Second Advisor

Dr. Arnold Stromberg

Abstract

As data become increasingly available, statisticians are confronted with both larger sample sizes and larger numbers of predictors. While both of these factors are beneficial in building better predictive models and allowing for better inference, models can become difficult to interpret and often include variables of little practical significance. This dissertation provides methods that assist model builders to better understand and select from a collection of candidate models. We study the asymptotic distribution of AIC and propose a graphical tool to assist practitioners in comparing and contrasting candidate models. Real-world examples show how this graphic might be used and a shiny application which implements the graphic is provided. Results from simulation studies comparing different rules of thumb for selecting models using AIC are also given. Next we look at measuring variability for R-squared when comparing nested models. Taking advantage of the F-test theory for multiple linear regression, exact confidence intervals for the information lost are derived. A real-world example involving estimating information loss when aggregating variables from questionnaires is examined. Finally, we discuss future directions of this work, including a method for selecting models based on a given observation.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2020.518

Share

COinS