Archived

This content is available here strictly for research, reference, and/or recordkeeping and as such it may not be fully accessible. If you work or study at University of Kentucky and would like to request an accessible version, please use the SensusAccess Document Converter.

Author ORCID Identifier

https://orcid.org/0000-0003-3495-3873

Date Available

1-18-2025

Year of Publication

2024

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Arts and Sciences

Department/School/Program

Statistics

Faculty

Dr. Katherine Thompson

Faculty

Dr. Katherine Thompson

Abstract

This paper introduces the bar-code variable, a novel method for processing a sequence of binary explanatory variables efficiently in the linear regression modeling framework. Represented as an integer or a sequence of bits, the bar-code variable captures infor- mation on original binary variables and their potential interaction effects. Utilizing the bar-code variable, the study explores streamlined feature selection in linear re- gression modeling with binary explanatory variables. The paper demonstrates how the bar-code variable, through re-parameterization, facilitates the transition from cell means estimates, µ̂, in the cell-means ANOVA model to coefficient estimates, β̂, in the linear regression model, and vice versa. The adoption of bar-code variable most importantly improves memory usage and computational efficiency. Furthermore, this provides a unique perspective on feature selection from all possible interaction effects when the use of the bar-code variable is extended to be integrated with agglomerative clustering and Lasso regression. Additionally, two novel importance score methods are introduced to further leverage the bar-code variable in identifying interaction effects. These findings will contribute to a more efficient and insightful statistical analysis approach.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.21

Share

COinS