Abstract

Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The algorithm makes use of personal genomic information and performs an unbiased alignment towards genome indices carrying both reference and alternative bases. Importantly, this breaks the dependency on reference genome splice site dinucleotide motifs and enables iMapSplice to discover personal splice junctions created through splice site polymorphisms. We report comparative analyses using a number of simulated and real datasets. Besides general improvements in read alignment and splice junction discovery, iMapSplice greatly alleviates allelic ratio biases and unravels many previously uncharacterized splice junctions created by splice site polymorphisms, with minimal overhead in computation time and storage. Software download URL: https://github.com/LiuBioinfo/iMapSplice.

Document Type

Article

Publication Date

8-10-2018

Notes/Citation Information

Published in PLOS ONE, v. 13, no. 8, e0201554, p. 1-14.

© 2018 Liu et al.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Digital Object Identifier (DOI)

https://doi.org/10.1371/journal.pone.0201554

Funding Information

US National Science Foundation [CAREER award grant number 1054631 to J.L.]; National Institutes of Health [grant number P30CA177558 and 5R01HG006272-03 to J.L.]. Additional financial support was received from the Lourie Foundation and through endowments at the Gluck Equine Research Center, University of Kentucky.

Related Content

All data files are available from the Geuvadis RNA sequencing project and the 1000 Genomes project. Accession numbers are within the supporting information files.

S1 File. Supplementary file, including iMapSplice algorithm and usage details, SNP-mer selection and its impact on iMapSplice performance, simulated data information, general splice junction discovery sensitivity and specificity on simulated data, and impact of genomic variant frequency and sequencing error frequency on aligner performance. https://doi.org/10.1371/journal.pone.0201554.s001 (DOCX)

S1 Table. Distributions of SNPs covered by each SNP-mer of different lengths. https://doi.org/10.1371/journal.pone.0201554.s002 (XLSX)

S2 Table. The impact of SNP-mer length on iMapSplice-phased in terms of reference allelic ratio distribution. https://doi.org/10.1371/journal.pone.0201554.s003 (XLSX)

S3 Table. Coriell ids, Geuvadis ids, and SNP numbers of the 74 real human datasets. https://doi.org/10.1371/journal.pone.0201554.s004 (XLSX)

S4 Table. Read categories correctly aligned by iMapSplice-unphased but failed by MapSplice. https://doi.org/10.1371/journal.pone.0201554.s005 (XLSX)

S5 Table. Detected splice junctions with polymorphisms at splice sites. https://doi.org/10.1371/journal.pone.0201554.s006 (XLSX)

S6 Table. Splice site dinucleotide list of canonical/noncanonical splice junctions. https://doi.org/10.1371/journal.pone.0201554.s007 (XLSX)

journal.pone.0201554.s001.docx (155 kB)
S1 File. Supplementary file, including iMapSplice algorithm and usage details, SNP-mer selection and its impact on iMapSplice performance, simulated data information, general splice junction discovery sensitivity and specificity on simulated data, and impact of genomic variant frequency and sequencing error frequency on aligner performance.

journal.pone.0201554.s002.xlsx (11 kB)
S1 Table. Distributions of SNPs covered by each SNP-mer of different lengths.

journal.pone.0201554.s003.xlsx (10 kB)
S2 Table. The impact of SNP-mer length on iMapSplice-phased in terms of reference allelic ratio distribution.

journal.pone.0201554.s004.xlsx (12 kB)
S3 Table. Coriell ids, Geuvadis ids, and SNP numbers of the 74 real human datasets.

journal.pone.0201554.s005.xlsx (10 kB)
S4 Table. Read categories correctly aligned by iMapSplice-unphased but failed by MapSplice.

journal.pone.0201554.s006.xlsx (299 kB)
S5 Table. Detected splice junctions with polymorphisms at splice sites.

journal.pone.0201554.s007.xlsx (11 kB)
S6 Table. Splice site dinucleotide list of canonical/noncanonical splice junctions.

Share

COinS