Interrupted adenylation (A) domains are key to the immense structural diversity seen in the nonribosomal peptide (NRP) class of natural products (NPs). Interrupted A domains are A domains that contain within them the catalytic portion of another domain, most commonly a methylation (M) domain. It has been well documented that methylation events occur with extreme specificity on either the backbone (N-) or side chain (O- or S-) of the amino acid (or amino acid-like) building blocks of NRPs. Here, through taxonomic and phylogenetic analyses as well as multiple sequence alignments, we evaluated the similarities and differences between interrupted A domains. We probed their taxonomic distribution amongst bacterial organisms, their evolutionary relatedness, and described conserved motifs of each type of M domain found to be embedded in interrupted A domains. Additionally, we categorized interrupted A domains and the M domains within them into a total of seven distinct families and six different types, respectively. The families of interrupted A domains include two new families, 6 and 7, that possess new architectures. Rather than being interrupted between the previously described a2–a3 or a8–a9 of the ten conserved A domain sequence motifs (a1–a10), family 6 contains an M domain between a6–a7, a previously unknown interruption site. Family 7 demonstrates that di-interrupted A domains exist in Nature, containing an M domain between a2–a3 as well as one between a6–a7, displaying a novel arrangement. These in-depth investigations of amino acid sequences deposited in the NCBI database highlighted the prevalence of interrupted A domains in bacterial organisms, with each family of interrupted A domains having a different taxonomic distribution. They also emphasized the importance of utilizing a broad range of bacteria for NP discovery. Categorization of the families of interrupted A domains and types of M domains allowed for a better understanding of the trends of naturally occurring interrupted A domains, which illuminated patterns and insights on how to harness them for future engineering studies.

Document Type


Publication Date


Notes/Citation Information

Published in RSC Chemical Biology, v. 1, issue 4.

© The Royal Society of Chemistry 2020

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.

Digital Object Identifier (DOI)


Funding Information

This study was supported by a National Science Foundation (NSF) CAREER Award MCB-1149427 (to S. G.-T.) and by startup funds from the University of Kentucky College of Pharmacy (to S. G.-T.). T. A. L. was in part supported by a 2019–2020 Pharmaceutical Sciences Excellence in Graduate Achievement Fellowship from the College of Pharmacy at the University of Kentucky as well as a 2019–2020 Pre-doctoral Fellowship in Pharmaceutical Sciences from the American Foundation of Pharmaceutical Education (AFPE).

Related Content

Electronic supplementary information (ESI) available: Experimental procedures for the construction of the data sets used for the taxonomic and phylogenetic trees, multiple sequence alignments and boundary identification of interrupted A domains, and identification of M domain conserved domain motifs and assignment of M domain types. Detailed information of all families 1–6 (Tables S1–S7), and conserved regions of M domain types (Table S8). Taxonomic tree of families 1–4 interrupted A domain (Fig. S1–S5), taxonomic and phylogenetic trees of families 5a, 5b, and 6 (Fig. S6 and S7), phylogenetic tree of families 1–4 (Fig. S8–S12) and full-length multiple sequence alignment of families 1–7 interrupted A domains (Fig. S13–S21). See DOI: 10.1039/d0cb00092b

The supplementary information is also available for download as the additional file listed at the end of this record.

d0cb00092b1.pdf (3994 kB)
Supplementary information