Abstract

Background/Objectives: Pathway annotations of non-macromolecular (relatively small) biomolecules facilitate biological and biomedical interpretation of metabolomics datasets. However, low pathway annotation levels of detected biomolecules hinder this type of interpretation. Thus, predicting the pathway involvement of detected but unannotated biomolecules has a high potential to improve metabolomics data analysis and omics integration. Past publications have only made use of the Kyoto Encyclopedia of Genes and Genomes-derived datasets to develop machine learning models to predict pathway involvement. However, to our knowledge, the Reactome knowledgebase has not been utilized to develop these types of predictive models.

Methods: We created a dataset ready for machine learning using chemical representations of all pathway-annotated compounds available from the Reactome knowledgebase. Next, we trained and evaluated a multilayer perceptron binary classifier using combined metabolite-pathway paired feature vectors engineered from this new dataset.

Results: While models trained on a prior corresponding KEGG dataset with 502 pathways scored a mean Matthew’s correlation coefficient (MCC) of 0.847 and a 0.0098 standard deviation, the models trained on the Reactome dataset with 3985 pathways demonstrated improved performance with a mean MCC of 0.916, but with a higher standard deviation of 0.0149.

Conclusions: These results indicate that the pathways in Reactome can also be effectively predicted, greatly increasing the number of human-defined pathways available for prediction.

Document Type

Article

Publication Date

3-2025

Notes/Citation Information

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

Digital Object Identifier (DOI)

https://doi.org/10.3390/metabo15030161

Funding Information

This research was funded by the National Science Foundation, grant number 2020026 (PI Moseley), and by the National Institutes of Health, grant number P42 ES007380 (University of Kentucky Superfund Research Program Grant; PI Pennell). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Science Foundation or the National Institute of Environmental Health Sciences.

Share

COinS