Author ORCID Identifier
Year of Publication
Doctor of Philosophy (PhD)
Arts and Sciences
Dr. Katherine Thompson
This paper introduces the bar-code variable, a novel method for processing a sequence of binary explanatory variables efficiently in the linear regression modeling framework. Represented as an integer or a sequence of bits, the bar-code variable captures infor- mation on original binary variables and their potential interaction effects. Utilizing the bar-code variable, the study explores streamlined feature selection in linear re- gression modeling with binary explanatory variables. The paper demonstrates how the bar-code variable, through re-parameterization, facilitates the transition from cell means estimates, µ̂, in the cell-means ANOVA model to coefficient estimates, β̂, in the linear regression model, and vice versa. The adoption of bar-code variable most importantly improves memory usage and computational efficiency. Furthermore, this provides a unique perspective on feature selection from all possible interaction effects when the use of the bar-code variable is extended to be integrated with agglomerative clustering and Lasso regression. Additionally, two novel importance score methods are introduced to further leverage the bar-code variable in identifying interaction effects. These findings will contribute to a more efficient and insightful statistical analysis approach.
Digital Object Identifier (DOI)
Park, Lee Sak, "Bar-Code Variable: A Novel Approach to Efficiently Find Interaction Effects" (2024). Theses and Dissertations--Statistics. 74.
Available for download on Saturday, January 18, 2025