Abstract
Gene-annotation enrichment is a common method for utilizing ontology-based annotations in gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission or modification lest erroneous term mappings occur. To address these issues, we created the Gene Ontology Categorization Suite, or GOcats—a novel tool that organizes the Gene Ontology into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the improvements in annotation enrichment by re-interpreting edges that would otherwise be omitted by traditional ancestor path-tracing methods. Specifically, we show that GOcats’ unique handling of relations improves enrichment over conventional methods in the analysis of two different gene-expression datasets: a breast cancer microarray dataset and several horse cartilage development RNAseq datasets. With the breast cancer microarray dataset, we observed significant improvement (one-sided binomial test p-value = 1.86E-25) in 182 of 217 significantly enriched GO terms identified from the conventional path traversal method when GOcats’ path traversal was used. We also found new significantly enriched terms using GOcats, whose biological relevancy has been experimentally demonstrated elsewhere. Likewise, on the horse RNAseq datasets, we observed a significant improvement in GO term enrichment when using GOcat’s path traversal: one-sided binomial test p-values range from 1.32E-03 to 2.58E-44.
Document Type
Article
Publication Date
8-15-2019
Digital Object Identifier (DOI)
https://doi.org/10.1371/journal.pone.0220728
Funding Information
This work was supported in part by grants NSF 1419282 (to HNBM), NIH 1U24DK097215-01A1 (to HNBM), and NIH UL1TR001998-01.
Related Content
Supplemental figures and data can be found at https://figshare.com/s/952a4d001cc8850d6d5e. GOcats (https://pypi.python.org/pypi/GOcats) is an open-source Python software package available under the BSD-3 License and from the GitHub repository at https://github.com/MoseleyBioinformaticsLab/GOcats. Documentation can be found at http://gocats.readthedocs.io/en/latest/. The exact version of GOcats used in this study, along with all scripts used to generate results can be found in the FigShare repository at https://figshare.com/s/9d55b2e5932992e6a068. Software and full results available at http://software.cesb.uky.edu.
Repository Citation
Hinderer, Eugene Waverly III; Flight, Robert M.; Dubey, Rashmi; MacLeod, James N.; and Moseley, Hunter N. B., "Advances in Gene Ontology Utilization Improve Statistical Power of Annotation Enrichment" (2019). Maxwell H. Gluck Equine Research Center Faculty Publications. 42.
https://uknowledge.uky.edu/gerc_facpub/42
S1 File. Comparing adjusted p-values between omitted has_part and GOcats part_of_some edges. https://doi.org/10.1371/journal.pone.0220728.s001
Included in
Biochemistry, Biophysics, and Structural Biology Commons, Bioinformatics Commons, Large or Food Animal and Equine Medicine Commons, Oncology Commons
Notes/Citation Information
Published in PLOS ONE, v. 14, no. 8, 0220728, p. 1-20.
© 2019 Hinderer et al.
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.