Background: Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology quality assurance process. Detecting and eliminating redundant relations would help improve the results of all methods relying on the relevant ontological systems as knowledge source, such as the computation of semantic distance between concepts and for ontology matching and alignment.
Results: This paper introduces a novel and scalable approach, called FEDRR – Fast, Exhaustive Detection of Redundant Relations – for quality assurance work during ontological evolution. FEDRR combines the algorithm ideas of Dynamic Programming with Topological Sort, for exhaustive mining of all redundant hierarchical relations in ontological hierarchies, in O(c·|V|+|E|) time, where |V| is the number of concepts, |E| is the number of the relations, and c is a constant in practice. Using FEDRR, we performed exhaustive search of all redundant is-a relations in two of the largest ontological systems in biomedicine: SNOMED CT and Gene Ontology (GO). 372 and 1609 redundant is-a relations were found in the 2015-09-01 version of SNOMED CT and 2015-05-01 version of GO, respectively. We have also performed FEDRR on over 190 source vocabularies in the UMLS - a large integrated repository of biomedical ontologies, and identified six sources containing redundant is-a relations. Randomly generated ontologies have also been used to further validate the efficiency of FEDRR.
Conclusions: FEDRR provides a generally applicable, effective tool for systematic detecting redundant relations in large ontological systems for quality improvement.
Digital Object Identifier (DOI)
The project described was supported by the National Center for Advancing Translational Sciences, UL1TR000117, and in part by the Software Solutions of Applied Research and Technology Program at Western Kentucky University.
SNOMED CT, GO, and UMLS used in this paper are downloaded from their own websites, which are freely available. The source code of FEDRR is available at https://github.com/gmingx/fedrr.
Xing, Guangming; Zhang, Guo-Qiang; and Cui, Licong, "FEDRR: Fast, Exhaustive Detection of Redundant Hierarchical Relations for Quality Improvement of Large Biomedical Ontologies" (2016). Institute for Biomedical Informatics Faculty Publications. 2.