Helicobacter pylori is a genetically diverse bacterial species that colonizes the stomach in about half of the human population. Most persons colonized by H. pylori remain asymptomatic, but the presence of this organism is a risk factor for gastric cancer. Multiple populations and subpopulations of H. pylori with distinct geographic distributions are recognized. Genetic differences among these populations might be a factor underlying geographic variation in gastric cancer incidence. Relatively little is known about the genomic features of African H. pylori strains compared to other populations of strains. In this study, we first analyzed the genomes of H. pylori strains from seven globally distributed populations or subpopulations and identified encoded proteins that exhibited the highest levels of sequence divergence. These included secreted proteins, an LPS glycosyltransferase, fucosyltransferases, proteins involved in molybdopterin biosynthesis, and Clp protease adaptor (ClpS). Among proteins encoded by the cag pathogenicity island, CagA and CagQ exhibited the highest levels of sequence diversity. We then identified proteins in strains of Western African origin (classified as hspWAfrica by MLST analysis) with sequences that were highly divergent compared to those in other populations of strains. These included ATP-dependent Clp protease, ClpS, and proteins of unknown function. Three of the divergent proteins sequences identified in West African strains were characterized by distinct insertions or deletions up to 8 amino acids in length. These polymorphisms in rapidly evolving proteins represent robust genetic signatures for H. pylori strains of West African origin.

Document Type


Publication Date


Notes/Citation Information

Published in PLOS ONE, v. 12, 11, e0188804, p. 1-17.

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Digital Object Identifier (DOI)


Funding Information

This work was supported by the National Institutes of Health AI118932, AI039657, CA116087 (TC); and U.S. Department of Veterans Affairs 2I01BX000627 (TC).

journal.pone.0188804.s001.xlsx (13 kB)
S1 Table. Proteins exhibiting high levels of sequence divergence among strains from seven geographically distributed H. pylori populations.

journal.pone.0188804.s002.docx (12 kB)
S2 Table. Examples of proteins exhibiting a high level of sequence conservation when comparing geographically dispersed populations of H. pylori.

journal.pone.0188804.s003.docx (13 kB)
S3 Table. Characteristics of strains classified as hpEurope and hspWAfrica.

journal.pone.0188804.s004.docx (12 kB)
S4 Table. Examples of proteins exhibiting a high level of sequence conservation when comparing hspWAfrica and hpEurope populations of H. pylori.

journal.pone.0188804.s005.docx (12 kB)
S5 Table. MLST classification of H. pylori strains analyzed in this study.

journal.pone.0188804.s006.tif (1670 kB)
S1 Fig. MLST analysis of H. pylori strains known or predicted to have African or European origins.