Year of Publication


Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation




Computer Science

First Advisor

Dr. Ramakanth Kavuluru

Second Advisor

Dr. Zongming Fei


Identifying new potential treatment options for medical conditions that cause human disease burden is a central task of biomedical research. Since all candidate drugs cannot be tested with animal and clinical trials, in vitro approaches are first attempted to identify promising candidates. Likewise, identifying other essential relations (e.g., causation, prevention) between biomedical entities is also critical to understand biomedical processes. Hence, it is crucial to develop automated relation prediction systems that can yield plausible biomedical relations to expedite the discovery process. In this dissertation, we demonstrate three approaches to predict treatment relations between biomedical entities for the drug repositioning task using existing biomedical knowledge bases. Our approaches can be broadly labeled as link prediction or knowledge base completion in computer science literature. Specifically, first we investigate the predictive power of graph paths connecting entities in the publicly available biomedical knowledge base, SemMedDB (the entities and relations constitute a large knowledge graph as a whole). To that end, we build logistic regression models utilizing semantic graph pattern features extracted from the SemMedDB to predict treatment and causative relations in Unified Medical Language System (UMLS) Metathesaurus. Second, we study matrix and tensor factorization algorithms for predicting drug repositioning pairs in repoDB, a general purpose gold standard database of approved and failed drug–disease indications. The idea here is to predict repoDB pairs by approximating the given input matrix/tensor structure where the value of a cell represents the existence of a relation coming from SemMedDB and UMLS knowledge bases. The essential goal is to predict the test pairs that have a blank cell in the input matrix/tensor based on the shared biomedical context among existing non-blank cells. Our final approach involves graph convolutional neural networks where entities and relation types are embedded in a vector space involving neighborhood information. Basically, we minimize an objective function to guide our model to concept/relation embeddings such that distance scores for positive relation pairs are lower than those for the negative ones. Overall, our results demonstrate that recent link prediction methods applied to automatically curated, and hence imprecise, knowledge bases can nevertheless result in high accuracy drug candidate prediction with appropriate configuration of both the methods and datasets used.

Digital Object Identifier (DOI)

Funding Information

Funding for this research was provided by the Ministry of National Education, the Republic of Turkey.

Available for download on Thursday, October 22, 2020