Author ORCID Identifier

https://orcid.org/0000-0003-4759-5534

Date Available

8-15-2024

Year of Publication

2024

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Agriculture

Department/School/Program

Veterinary Science

Advisor

Dan K. Howe

Abstract

Sarcocystis neurona is an apicomplexan parasite and the main etiologic agent behind Equine Protozoal Myeloencephalitis (EPM), the most diagnosed neurologic disease in horses living in the Americas. Technical advances that arose from the human genome project between 1990 and 2003 and the corresponding “omics” revolution in research have allowed scientists novel approaches to study organisms. Herein, we sought to leverage bioinformatic advances and new sequencing platforms to provide an improved understanding of the biology of S. neurona. At the outset of this project, reference genomes from two different S. neurona strains (SN3.e1 and So SN1) were available. A new and improved S. neurona reference genome has been produced (here referred to as SN3UKY). Oxford Nanopore and PacBio Hi-C sequence data from a more recent non-clonal SN3 strain were generated and assembled, polished with Illumina sequence data, and transcript and protein evidence-informed gene structures were manually annotated. This effort increased the N50 of the genome assembly by ~25% (+9,624,906 bp) and reduced assembly fragmentation by ~29%. Genome annotation was assisted by both transcript evidence and computational gene structure prediction. The number of complete and single-copy gene content was increased by 2.59% (BUSCO Apicomplexan database) or 0.67% (Coccidian database). Depending on the reference database, 32 and 27 expected genes were missing from both the original and the new SN3 genome assembly annotations. While most of the missing genes are hypothetical protein-coding or lacking functional description in databases, ribosomal protein L7/L12 and GTP 3',8-cyclase are of note. This potentially indicates a species-specific divergence from the rest of the Apicomplexa and Coccidia, respectively. The newly assembled S. neurona SN3UKY reference genome was used further to investigate the gene family of surface antigen paralogues, termed SnSAGs and SAG1-related sequences (SRSs), that are immunogenic and used for EPM diagnostics. Prior analyses of multiple S. neurona isolates revealed three major surface antigens, SnSAG1, SnSAG5, and SnSAG6, that are isolate specific and mutually exclusive of one another. Assemblies of Illumina WGS from four different S. neurona isolates compared to the SN3 reference demonstrated that the genes for SnSAG1, SnSAG5, and SnSAG6 are syntenic in their respective S. neurona isolate. All other SAG/SRS family paralogues were found to be shared amongst the S. neurona isolates. Whole-genome sequencing (WGS) using the Illumina platform was performed on the genomes of several additional Sarcocystis species. Gene prediction models generated using transcriptomic evidence from S. neurona SN3 were generated. These models were used to predict genes in other Sarcocystis species, and their quality was assessed. It was determined that transcript evidence from S. neurona improved the quality of gene prediction when compared to models based on evolutionary close relative Toxoplasma gondii. It was also found that the quality of the genome assembly had a large impact on the quality of gene prediction. Since the SN3 strain has been propagated extensively in vitro, it was posited that noted genomic differences were due to genomic loss during cell culture passage rather than natural strain variation. To address this question, WGS Illumina sequencing was performed on a stock of the SN3 strain cryopreserved in 1991 soon after isolation. Mapping of the SN3.1991 Illumina sequence reads to the SN3UKY reference genome predicted that 1.73% (~2.19 mb +/- 509 kb) of the SN3 genome had been lost during ~33 years of asexual replication in vitro. Assembly of the non-mapping reads produced an average of 2,662 contiguous sequences ≥ 500 bp derived deletions (i.e., non-mapping contigs). The largest of these deletions was at least 102kb and was confirmed experimentally via PCR. Structural annotation of the deletions predicted the loss of 1,727 genes during in vitro propagation. Of these, 273 genes could be functionally categorized to a molecular function: 210 (76.9%) were predicted to be involved in binding, 173 (63.3%) were predicted to be involved in catalytic activity, 27 (9.89%) were predicted to be involved in transcription regulation, and 43 (15.75%) successfully aligned to entries in the NCBI non-redundant sequence database and were assigned a gene description. Of note, homologues of the rhoptry metalloprotease toxolysin 1 (TLN1) and the bradyzoite antigen 1 (BAG1) that have been characterized in Toxoplasma gondii were among the genes predicted to have been lost. Collectively, the genes located on the deletions may encode proteins that are important during other life cycle stages but modulate growth of the asexual development stages present in cell culture. This work will improve our capacity to accurately study S. neurona and other closely related organisms.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.397

Supplemental File 2.1.txt (29 kB)
Full Adobe Illustrator System Info

Supplemental File 2.2.sh (1 kB)
Splice Site Extraction

Supplemental File 2.3.R (5 kB)
Generate and Sort GFF3 File

Supplemental File 2.4.R (1 kB)
Predictions Not Overlapped by Manual Annotation

Supplemental Table 3.1.xlsx (21 kB)
SnSAG Comparison Including Expression

Supplemental Data 3.1.fasta (176 kB)
SnSAG1/SnSAG5/SnSAG6 and surrounding locus

Supplemental Data 3.2.gff3 (34 kB)
Annotation of All SnSAG and SRS Domains

Supplemental Data 3.3.fasta (21 kB)
SnSAG Amino Acid Sequences

Supplemental Data 3.4.txt (42 kB)
Alignment of SnSAG Amino Acid Sequences

Supplemental File 4.1.html (1551 kB)
Raw Sequence QC

Supplemental File 4.3_small.pdf (12986 kB)
Genome Assembly Taxonomic Visualization

Supplemental File 4.2.html (1554 kB)
Minimally Processed Sequence QC

Share

COinS