The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice.
Digital Object Identifier (DOI)
Wang, Kai; Singh, Darshan; Zeng, Zheng; Coleman, Stephen J.; Huang, Yan; Savich, Gleb L.; He, Xiaping; Mieczkowski, Piotr; Grimm, Sara A.; Perou, Charles M.; Macleod, James N.; Chiang, Derek Y.; Prins, Jan F.; and Liu, Jinze, "MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery" (2010). Computer Science Faculty Publications. 4.