Date Available
5-13-2020
Year of Publication
2019
Degree Name
Doctor of Philosophy (PhD)
Document Type
Doctoral Dissertation
College
Arts and Sciences
Department/School/Program
Statistics
First Advisor
Dr. Ruriko Yoshida
Abstract
A phylogenetic tree is a tree to represent an evolutionary history between species or other entities. Phylogenomics is a new field intersecting phylogenetics and genomics and it is well-known that we need statistical learning methods to handle and analyze a large amount of data which can be generated relatively cheaply with new technologies. Based on the existing Markov models, we introduce a new method, CURatio, to identify outliers in a given gene data set. This method, intrinsically an unsupervised method, can find outliers from thousands or even more genes. This ability to analyze large amounts of genes (even with missing information) makes it unique in many parametric methods. At the same time, the exploration of statistical analysis in high-dimensional space of phylogenetic trees has never stopped, many tree metrics are proposed to statistical methodology. Tropical metric is one of them. We implement a MCMC sampling method to estimate the principal components in a tree space with the tropical metric for achieving dimension reduction and visualizing the result in a 2-D tropical triangle.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2019.189
Recommended Citation
Kang, Qiwen, "UNSUPERVISED LEARNING IN PHYLOGENOMIC ANALYSIS OVER THE SPACE OF PHYLOGENETIC TREES" (2019). Theses and Dissertations--Statistics. 39.
https://uknowledge.uky.edu/statistics_etds/39