Abstract

Background: The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results: We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions: The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances.

Document Type

Article

Publication Date

3-2023

Notes/Citation Information

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate- rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Digital Object Identifier (DOI)

https://doi.org/10.1186/s12859-023-05208-0

Funding Information

This work has been supported by the National Science Foundation [NSF 2020026 to H.N.B.M.] and the National Institute of Health [NIH CF R03OD030603 to H.N.B.M.]

Share

COinS