Technological advances now make it possible to generate diverse, complex and varying sizes of data in a wide range of applications from business to engineering to medicine. In the health sciences, in particular, data are being produced at an unprecedented rate across the full spectrum of scientific inquiry spanning basic biology, clinical medicine, public health and health care systems. Leveraging these data can accelerate scientific advances, health discovery and innovations. However, data are just the raw material required to generate new knowledge, not knowledge on its own, as a pile of bricks would not be mistaken for a building. In order to solve complex scientific problems, appropriate methods, tools and technologies must be integrated with domain knowledge expertise to generate and analyze big data. This integrated interdisciplinary approach is what has become to be widely known as data science. Although the discipline of data science has been rapidly evolving over the past couple of decades in resource-rich countries, the situation is bleak in resource-limited settings such as most countries in Africa primarily due to lack of well-trained data scientists. In this paper, we highlight a roadmap for building capacity in health data science in Africa to help spur health discovery and innovation, and propose a sustainable potential solution consisting of three key activities: a graduate-level training, faculty development, and stakeholder engagement. We also outline potential challenges and mitigating strategies.

Document Type


Publication Date


Notes/Citation Information

Published in Frontiers in Public Health, v. 9, article 710961.

© 2021 Beyene, Harrar, Altaye, Astatkie, Awoke, Shkedy and Mersha

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Digital Object Identifier (DOI)


Funding Information

JB acknowledges partial support by the Natural Sciences and Engineering Research Council (NSERC) of Canada, grant RGPIN-2009_293295. JB holds the John D. Cameron Endowed Chair in the Genetic Determinants of Chronic Diseases, Department of Health Research, Methods, Evidence, and Impact, McMaster University. TM acknowledges partial support by the National Heart, Lung, and Blood Institute (NHLBI), grant R01 HL132344.

Related Content

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2021.710961/full#supplementary-material The material is also available for download as the additional file at the end of this record.