Author ORCID Identifier

https://orcid.org/0009-0003-8496-5271

Date Available

4-24-2024

Year of Publication

2024

Degree Name

Master of Computer and Information Science (MCIS)

Document Type

Master's Thesis

College

Engineering

Department/School/Program

Computer Science

First Advisor

Dr. Ramakanth Kavuluru

Abstract

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.77

Funding Information

This study was supported by the National Institutes of Health Grant (no.: R01LM013240) in 2020.

Share

COinS