Author ORCID Identifier

Date Available


Year of Publication


Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation




Chemical and Materials Engineering

First Advisor

Dr. Qing Shao


Hydrophobic deep eutectic solvents (DESs) have emerged as excellent extractants. A major challenge is the lack of an efficient tool to discover DES candidates. Currently, the search relies heavily on the researchers’ intuition or a trial-and-error process, which leads to a low success rate or bypassing of promising candidates. DES performance depends on the heterogeneous hydrogen bond environment formed by multiple hydrogen bond donors and acceptors. Understanding this heterogeneous hydrogen bond environment can help develop principles for designing high performance DESs for extraction and other separation applications. This work investigates the structure and dynamics of hydrogen bonds in hydrophobic DESs using molecular dynamics (MD) simulations. The results show the diversity of hydrogen bonds in the eight DESs and their impact on diffusivity and molecular association. The dominating hydrogen bonds determine whether the DESs are governed by intra- or inter-component associations. The DES-aqueous liquid–liquid interface also plays a vital role in the extraction ability of DESs. One question is how the DES compositions affect the structural features of the interface. This work also investigates the density profile, dipole moment, and hydrogen bonds of hydrophobic DES-aqueous interfaces using MD simulations. The results show the variations of dipole moment and hydrogen bond structure and dynamics at the interfaces. These variations could influence the extraction ability of DES through adjusting the partition and kinetics of organic substrates in the DES-aqueous biphasic systems. Recognizing the central role that hydrogen bonds play in the DES formation, this work aims to decipher the hydrogen bond features for DESs and develop machine learning models to predict the potential of a system to be DES based on the hydrogen bond-based descriptors. Based on our MD simulation results, we developed 30 machine learning models using ten algorithms and three types of hydrogen bond-based descriptors. The model performance is first benchmarked using their average and minimal ROC-AUC values. We also analyze the importance of individual features in the models and the results are consistent with the simulation-based statistical analysis. Finally, we validate the prediction ability of the models using the experimental results of 34 systems. Our work iterates the importance of hydrogen bond in DES formation and shows the potential of machine learning in discovering new DESs. Large protein language models (PLMs) have presented excellent potential to reshape protein research. The trained PLMs encode the amino acid sequence of a protein to a mathematical embedding that can be used for protein design or property prediction. It is recognized that protein 3D structure plays an important role in protein properties and functions. However, most PLMs are trained only on sequence data and lack protein 3D structure information. The lack of such crucial 3D structure information hampers the prediction capacity of PLMs in various applications. We utilize contrastive learning to develop a 3D structure-aware protein language model (S-PLM). The model encodes the sequence and 3D structure of proteins separately and deploys a multi-view contrastive loss function to enable the information exchange between the sequence and 3D structure embeddings. Our analysis shows that contrastive learning effectively incorporates 3D structure information into sequence-based embeddings.

Digital Object Identifier (DOI)

Funding Information

  1. Startup Funds of the University of Kentucky
  2. Igniting Research Collaboration at the University of Kentucky
  3. UK Artificial Intelligence in Medicine Research Alliance Pilot (NCATS UL1TR001998 and NCI P30 CA177558)

supporting_information_for_chapter_5.pdf (32972 kB)
supplementary materials