Author ORCID Identifier

Year of Publication


Degree Name

Master of Science (MS)

Document Type

Master's Thesis




Computer Science

First Advisor

Dr. Sally Ellingson

Second Advisor

Dr. Nathan Jacobs


In order to reduce the time associated with and the costs of drug discovery, machine learning is being used to automate much of the work in this process. However the size and complex nature of molecular data makes the application of machine learning especially challenging. Much work must go into the process of engineering features that are then used to train machine learning models, costing considerable amounts of time and requiring the knowledge of domain experts to be most effective. The purpose of this work is to demonstrate data driven approaches to perform the feature selection and extraction steps in order to decrease the amount of expert knowledge required to model interactions between proteins and drug molecules.

Digital Object Identifier (DOI)