Author ORCID Identifier

Date Available


Year of Publication


Degree Name

Master of Computer and Information Science (MCIS)

Document Type

Master's Thesis




Computer Science

First Advisor

Dr. Ramakanth Kavuluru


End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions.

Digital Object Identifier (DOI)

Funding Information

This study was supported by the National Institutes of Health Grant (no.: R01LM013240) in 2020.