Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases
Metabolic flux analysis requires both a reliable metabolic model and reliable metabolic profiles in characterizing metabolic reprogramming. Advances in analytic methodologies enable production of high-quality metabolomics datasets capturing isotopic flux. However, useful metabolic models can be difficult to derive due to the lack of relatively complete atom-resolved metabolic networks for a variety of organisms, including human. Here, we developed a neighborhood-specific graph coloring method that creates unique identifiers for each atom in a compound facilitating construction of an atom-resolved metabolic network. What is more, this method is guaranteed to generate the same identifier for symmetric atoms, enabling automatic identification of possible additional mappings caused by molecular symmetry. Furthermore, a compound coloring identifier derived from the corresponding atom coloring identifiers can be used for compound harmonization across various metabolic network databases, which is an essential first step in network integration. With the compound coloring identifiers, 8865 correspondences between KEGG (Kyoto Encyclopedia of Genes and Genomes) and MetaCyc compounds are detected, with 5451 of them confirmed by other identifiers provided by the two databases. In addition, we found that the Enzyme Commission numbers (EC) of reactions can be used to validate possible correspondence pairs, with 1848 unconfirmed pairs validated by commonality in reaction ECs. Moreover, we were able to detect various issues and errors with compound representation in KEGG and MetaCyc databases by compound coloring identifiers, demonstrating the usefulness of this methodology for database curation.
Digital Object Identifier (DOI)
The work was supported in part by grants NSF 1419282 (PI Moseley) and NSF 2020026 (PI Moseley).
All data used and the results generated in this manuscript are available on: https://doi.org/10. 6084/m9.figshare.12894008.v1.
The following are available online at https://www.mdpi.com/2218-1989/10/9/368/s1, Figure S1: Derived coloring identifier, Figure S2: KEGG compound with S-containing aromatic ring, Figure S3: Representative compounds that cannot be distinguished by coloring identifier, Figure S4: KEGG Compound C00047, Table S1: Compounds with the same coloring identifiers, which includes R groups, Table S2: Generation of atom identifiers for compound C00047 via graph coloring method, Spreadsheet S1: All pairs detected by coloring identifiers, Spreadsheet S2: Inconsistency between KEGG and MetaCyc. They are also available for download as the additional file listed at the end of this record.
Jin, Huan; Mitchell, Joshua M.; and Moseley, Hunter N. B., "Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases" (2020). Molecular and Cellular Biochemistry Faculty Publications. 179.
Biochemistry, Biophysics, and Structural Biology Commons, Bioinformatics Commons, Oncology Commons
Published in Metabolites, v. 10, issue 9, 368.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).