Author ORCID Identifier
Date Available
1-18-2025
Year of Publication
2024
Degree Name
Doctor of Philosophy (PhD)
Document Type
Doctoral Dissertation
College
Arts and Sciences
Department/School/Program
Statistics
First Advisor
Dr. Katherine Thompson
Abstract
This paper introduces the bar-code variable, a novel method for processing a sequence of binary explanatory variables efficiently in the linear regression modeling framework. Represented as an integer or a sequence of bits, the bar-code variable captures infor- mation on original binary variables and their potential interaction effects. Utilizing the bar-code variable, the study explores streamlined feature selection in linear re- gression modeling with binary explanatory variables. The paper demonstrates how the bar-code variable, through re-parameterization, facilitates the transition from cell means estimates, µ̂, in the cell-means ANOVA model to coefficient estimates, β̂, in the linear regression model, and vice versa. The adoption of bar-code variable most importantly improves memory usage and computational efficiency. Furthermore, this provides a unique perspective on feature selection from all possible interaction effects when the use of the bar-code variable is extended to be integrated with agglomerative clustering and Lasso regression. Additionally, two novel importance score methods are introduced to further leverage the bar-code variable in identifying interaction effects. These findings will contribute to a more efficient and insightful statistical analysis approach.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2024.21
Recommended Citation
Park, Lee Sak, "Bar-Code Variable: A Novel Approach to Efficiently Find Interaction Effects" (2024). Theses and Dissertations--Statistics. 74.
https://uknowledge.uky.edu/statistics_etds/74