Archived
This content is available here strictly for research, reference, and/or recordkeeping and as such it may not be fully accessible. If you work or study at University of Kentucky and would like to request an accessible version, please use the SensusAccess Document Converter.
Author ORCID Identifier
Date Available
1-18-2025
Year of Publication
2024
Document Type
Doctoral Dissertation
Degree Name
Doctor of Philosophy (PhD)
College
Arts and Sciences
Department/School/Program
Statistics
Faculty
Dr. Katherine Thompson
Faculty
Dr. Katherine Thompson
Abstract
This paper introduces the bar-code variable, a novel method for processing a sequence of binary explanatory variables efficiently in the linear regression modeling framework. Represented as an integer or a sequence of bits, the bar-code variable captures infor- mation on original binary variables and their potential interaction effects. Utilizing the bar-code variable, the study explores streamlined feature selection in linear re- gression modeling with binary explanatory variables. The paper demonstrates how the bar-code variable, through re-parameterization, facilitates the transition from cell means estimates, µ̂, in the cell-means ANOVA model to coefficient estimates, β̂, in the linear regression model, and vice versa. The adoption of bar-code variable most importantly improves memory usage and computational efficiency. Furthermore, this provides a unique perspective on feature selection from all possible interaction effects when the use of the bar-code variable is extended to be integrated with agglomerative clustering and Lasso regression. Additionally, two novel importance score methods are introduced to further leverage the bar-code variable in identifying interaction effects. These findings will contribute to a more efficient and insightful statistical analysis approach.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2024.21
Recommended Citation
Park, Lee Sak, "Bar-Code Variable: A Novel Approach to Efficiently Find Interaction Effects" (2024). Theses and Dissertations--Statistics. 74.
https://uknowledge.uky.edu/statistics_etds/74
