Date Available
8-8-2023
Year of Publication
2023
Degree Name
Doctor of Philosophy (PhD)
Document Type
Doctoral Dissertation
College
Arts and Sciences
Department/School/Program
Statistics
First Advisor
Dr. Chenglong Ye
Second Advisor
Dr. Derek Young
Abstract
This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.
To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors that marginally independent but conditionally dependent on the response.
The second part of the dissertation focuses on high dimensional inference. The problem arise from microbiome and metabolome study. The microbial community in the human gut is teeming with metabolic activity and plays a key role in host physiology and health. But the host-microbiome interactions are not well understood in terms of the molecular mechanism, while the microbial metabolites have been hypothesized to play a critical role. This motivate us to developed a statistical framework that first quantifies the abundances of microbial metabolites and then examines the associations between such metabolites and disease outcomes. This framework also accounts for potential high-dimensional microbiome confounders, thereby avoiding potential false discoveries of disease-associated metabolites. We overcome this challenging inference problem based on the idea of debiasing lasso. In numerical study, we demonstrate its significant power improvement when comparing some popular existing methods.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2023.379
Funding Information
This study is partially funded by NSF-CIF-1813330.
Recommended Citation
fang, lei, "High Dimensional Data Analysis: variable screening and inference" (2023). Theses and Dissertations--Statistics. 70.
https://uknowledge.uky.edu/statistics_etds/70
Included in
Biostatistics Commons, Data Science Commons, Statistical Methodology Commons, Statistical Models Commons, Statistical Theory Commons