Abstract
Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations.
Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation.
Results—A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the “Clinical Finding” and “Procedure” sub-hierarchies. Two domain experts confirmed 185 (among 223) missing IS-A relations, a precision of 82.96%.
Conclusions—Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT.
Document Type
Article
Publication Date
2-2018
Digital Object Identifier (DOI)
https://doi.org/10.1016/j.jbi.2017.12.010
Funding Information
This work was supported by the National Science Foundation through grants IIS-1657306 and ACI-1626364, and the National Institutes of Health (NIH) National Center for Advancing Translational Sciences through grant UL1TR001998.
Repository Citation
Cui, Licong; Bodenreider, Olivier; Shi, Jay; and Zhang, Guo-Qiang, "Auditing SNOMED CT Hierarchical Relations Based on Lexical Features of Concepts in Non-Lattice Subgraphs" (2018). Computer Science Faculty Publications. 28.
https://uknowledge.uky.edu/cs_facpub/28
Notes/Citation Information
Published in Journal of Biomedical Informatics, v. 78, p. 177-184.
© 2017 Elsevier Inc.
This manuscript version is made available under the CC‐BY‐NC‐ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/.
The document available for download is the author's post-peer-review final draft of the article.