Date Available

1-18-2015

Year of Publication

2015

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Engineering

Department/School/Program

Computer Science

First Advisor

Dr. Jinze Liu

Abstract

The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We have proposed two methods directly addressing this challenge. First, we developed a novel method MultiSplice to accurately estimate the abundance of the well-annotated transcripts. Driven by the desire of detecting novel isoforms, a max-flow-min-cost algorithm named Astroid is designed for simultaneously discovering the presence and quantities of all possible transcripts in the transcriptome. We further extend an \emph{ab initio} pipeline of transcriptome analysis to large-scale dataset which may contain hundreds of samples. The effectiveness of proposed methods has been supported by a series of simulation studies, and their application on real datasets suggesting a promising opportunity in reconstructing mRNA transcriptome which is critical for revealing variations among cells (e.g. disease vs. normal).

Share

COinS