Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced mRNA from a pool of forty-three different tissues. From these, we derived the structures of 68,594 transcripts. In addition, we identified 301,829 positions with SNPs or small indels within these transcripts relative to EquCab2. Interestingly, 780 variants extend the open reading frame of the transcript and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross-species transcriptional and genomic comparisons.
Digital Object Identifier (DOI)
This work was supported by the National Science Foundation (Crosscut-EF-0850237 to JL and JNM and 1054631 to JL) and a Kentucky Infrastructure for Biomedical Research Excellence award (KY-INBRE, 5P20RR016481-09) from the National Institutes of Health. Additional financial support was received from the Lourie Foundation and through endowments at the Gluck Equine Research Center, University of Kentucky. TSK’s effort was supported in part by the KY IDeA Networks of Biomedical Research Excellence (Cooper PI) NIH/NIGMS 5P20GM103436-13, and this work was conducted in part using the resources of the University of Louisville’s research computing group and the Cardinal Research Cluster. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.; Zeng, Zheng; Liu, Jinze; Orlando, Ludovic; and MacLeod, James N., "Annotation of the Protein Coding Regions of the Equine Genome" (2015). Veterinary Science Faculty Publications. 31.
S1 Dataset: Variants that extend ORF found in both genomic (homozygous) and transcriptomic data.
S2_Dataset.pdf (2678 kB)
S2 Dataset: Tissues and horses used to create the RNA pool, including Agilent Bioanalyzer trace results.
S1_Fig.pdf (56 kB)
S1 Fig: BLAST thresholds for 1, 2, or multi-exon transcripts.