Author ORCID Identifier

https://orcid.org/0009-0006-1151-4420

Date Available

7-16-2024

Year of Publication

2024

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department/School/Program

Mathematics

First Advisor

Dr. Duc Nguyen

Abstract

Drug discovery is a highly complicated and time-consuming process. One of the main challenges in drug development is predicting whether a drug-like molecule will interact with a specific target protein. This prediction accelerates target validation and drug development. Recent research in biomolecular sciences has shown significant interest in algebraic graph-based models for representing molecular complexes and predicting drug-target binding affinity. In this thesis, we present algebraic graph-based molecular representations to create data-driven scoring functions (SF) using extended atom types to capture wide-range interactions between targets and drug candidates. Our model employs multiscale weighted colored subgraphs for the protein-ligand complex, colored based on SYBYL atom types. Utilizing machine learning and deep learning techniques such as gradient-boosting decision trees (GBDT), random forests (RF), support vector machines (SVM), extreme gradient boosting (XGBoost), convolutional neural networks (CNNs), and graph convolutional neural networks (GCNs), our SF outperformed numerous state-of-the-art models in various PDBbind benchmark datasets for binding affinity scoring power, the D3R dataset, a worldwide grand challenge in drug design, and various blood-brain permeability prediction datasets.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.373

Funding Information

This study was supported by-

1. National Science Foundation Grant (no.: 2053284) in 2021,

2. National Science Foundation Grant (no.: 2151802) in 2022,

3. National Science Foundation Grant (no.: 2245903) in 2023.

Included in

Mathematics Commons

Share

COinS