Year of Publication
Doctor of Philosophy (PhD)
Arts and Sciences
Dr. Chenglong Ye
Dr. Derek Young
This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.
To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors that marginally independent but conditionally dependent on the response.
The second part of the dissertation focuses on high dimensional inference. The problem arise from microbiome and metabolome study. The microbial community in the human gut is teeming with metabolic activity and plays a key role in host physiology and health. But the host-microbiome interactions are not well understood in terms of the molecular mechanism, while the microbial metabolites have been hypothesized to play a critical role. This motivate us to developed a statistical framework that first quantifies the abundances of microbial metabolites and then examines the associations between such metabolites and disease outcomes. This framework also accounts for potential high-dimensional microbiome confounders, thereby avoiding potential false discoveries of disease-associated metabolites. We overcome this challenging inference problem based on the idea of debiasing lasso. In numerical study, we demonstrate its significant power improvement when comparing some popular existing methods.
Digital Object Identifier (DOI)
This study is partially funded by NSF-CIF-1813330.
fang, lei, "High Dimensional Data Analysis: variable screening and inference" (2023). Theses and Dissertations--Statistics. 70.