Author ORCID Identifier

https://orcid.org/0000-0003-3495-3873

Date Available

1-18-2025

Year of Publication

2024

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department/School/Program

Statistics

First Advisor

Dr. Katherine Thompson

Abstract

This paper introduces the bar-code variable, a novel method for processing a sequence of binary explanatory variables efficiently in the linear regression modeling framework. Represented as an integer or a sequence of bits, the bar-code variable captures infor- mation on original binary variables and their potential interaction effects. Utilizing the bar-code variable, the study explores streamlined feature selection in linear re- gression modeling with binary explanatory variables. The paper demonstrates how the bar-code variable, through re-parameterization, facilitates the transition from cell means estimates, µ̂, in the cell-means ANOVA model to coefficient estimates, β̂, in the linear regression model, and vice versa. The adoption of bar-code variable most importantly improves memory usage and computational efficiency. Furthermore, this provides a unique perspective on feature selection from all possible interaction effects when the use of the bar-code variable is extended to be integrated with agglomerative clustering and Lasso regression. Additionally, two novel importance score methods are introduced to further leverage the bar-code variable in identifying interaction effects. These findings will contribute to a more efficient and insightful statistical analysis approach.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.21

Available for download on Saturday, January 18, 2025

Share

COinS