Year of Publication


Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation




Computer Science

First Advisor

Dr. Jinze Liu


The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We have proposed two methods directly addressing this challenge. First, we developed a novel method MultiSplice to accurately estimate the abundance of the well-annotated transcripts. Driven by the desire of detecting novel isoforms, a max-flow-min-cost algorithm named Astroid is designed for simultaneously discovering the presence and quantities of all possible transcripts in the transcriptome. We further extend an \emph{ab initio} pipeline of transcriptome analysis to large-scale dataset which may contain hundreds of samples. The effectiveness of proposed methods has been supported by a series of simulation studies, and their application on real datasets suggesting a promising opportunity in reconstructing mRNA transcriptome which is critical for revealing variations among cells (e.g. disease vs. normal).