Author ORCID Identifier
https://orcid.org/0009-0000-1817-9641
Date Available
7-27-2026
Year of Publication
2024
Degree Name
Doctor of Philosophy (PhD)
Document Type
Doctoral Dissertation
College
Arts and Sciences
Department/School/Program
Statistics
First Advisor
Dr. Chenglong Ye
Abstract
For high-dimensional data where the number of variables greatly exceeds the number of observations, selecting important variables while maintaining the required heredity conditions can be challenging. This dissertation is structured into three interconnected parts. In the first part, we propose a variable selection method by implementing a well-known optimization technique, the Genetic Algorithm. An R package was developed to simplify the implementation and usage of the proposed method. We then propose another variable selection method by extending the study from the Genetic Algorithm to a different but related optimization technique, Simulated Annealing. We consider three different hierarchical structures in both studies. We compare the performance and efficiency of the two proposed algorithms using multiple simulation studies. In the last part of the dissertation, a transfer learning-inspired algorithm with a specific focus on studying microbiome-metabolome interactions is proposed. We compare the proposed method with other existing methods in terms of mean squared error, type-I error, and power. An application of this method to real-world data reveals biologically significant interactions between gut microbes and various bile acids.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2024.284
Recommended Citation
Li, Leiyue, "Variable Selection for High-Dimensional Data with Interaction Effects: Methods, Applications, and Inferences" (2024). Theses and Dissertations--Statistics. 77.
https://uknowledge.uky.edu/statistics_etds/77
Included in
Applied Statistics Commons, Biostatistics Commons, Statistical Methodology Commons, Statistical Models Commons