Author ORCID Identifier

https://orcid.org/0000-0002-0281-3940

Date Available

8-5-2026

Year of Publication

2024

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Arts and Sciences

Department/School/Program

Statistics

Faculty

Chi Wang

Faculty

Thompson Katherine

Abstract

Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular heterogeneity and gene expression dynamics. Despite its potential, the inherent noise and sparsity of scRNA-seq data pose significant challenges in clustering cells into biologically meaningful groups. This dissertation addresses these challenges through two novel methodologies aimed at enhancing the accuracy and robustness of scRNA-seq data analysis.

First, we introduce the Differential Feature Selection (DIFS) framework, designed to improve the identification of differential features in scRNA-seq data. DIFS employs a two-stage marker identification process. In the first stage, a modified Dip Test is used to filter and identify genes with significant multimodal expression patterns, marking them as potential Stage I markers. The second stage focuses on identifying cluster-specific features or Stage II markers using clustering methods and Fisher's exact test. This combined approach provides a robust method for selecting highly informative genes, improving cell type classification and the understanding of cellular heterogeneity.

Second, we present a semi-supervised clustering approach that integrates SingleR, a supervised classification tool, with a modified Hierarchical Dirichlet Process (normHDP) model. SingleR provides preliminary cell type annotations based on a reference dataset, generating a similarity score matrix for each cell. This matrix serves as prior information in the Bayesian framework of the normHDP model, which is adapted to focus on clustering by retaining essential features like batch effect removal. This integration allows for dynamic adjustment of cluster assignments, accounting for both prior knowledge and unique patterns in new scRNA-seq data. To address the challenges of high-dimensional MCMC in clustering, we adopt an ensemble approach by running multiple chains with fewer iterations, improving posterior exploration and clustering accuracy.

We evaluate both methodologies using synthetic and real datasets, demonstrating their superior performance compared to traditional clustering techniques. Our findings highlight the effectiveness of these approaches in reducing the impact of data sparsity and noise, leading to more reliable identification of cell types and states. These methodologies offer powerful tools for the detailed analysis of cellular heterogeneity in complex biological systems, advancing our understanding of underlying biological processes and disease mechanisms.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.367

Recommended Citation

Liu, Kun, "Difs and Bayescluster: Novel Methods for Single_cell RNA Sequencing Analysis" (2024). Theses and Dissertations--Statistics. 79.
https://uknowledge.uky.edu/statistics_etds/79

Download

Available for download on Wednesday, August 05, 2026

Contact Author

Included in

Bioinformatics Commons, Biostatistics Commons, Statistical Models Commons

COinS

Theses and Dissertations--Statistics

Difs and Bayescluster: Novel Methods for Single_cell RNA Sequencing Analysis

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Statistics

Difs and Bayescluster: Novel Methods for Single_cell RNA Sequencing Analysis

Author

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect