Archived

This content is available here strictly for research, reference, and/or recordkeeping and as such it may not be fully accessible. If you work or study at University of Kentucky and would like to request an accessible version, please use the SensusAccess Document Converter.

Author ORCID Identifier

https://orcid.org/0009-0003-8496-5271

Date Available

4-24-2024

Year of Publication

2024

Document Type

Master's Thesis

Degree Name

Master of Computer and Information Science (MCIS)

College

Engineering

Department/School/Program

Computer Science

Faculty

Dr. Ramakanth Kavuluru

Faculty

Dr. Simone Silvestri

Abstract

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.77

Funding Information

This study was supported by the National Institutes of Health Grant (no.: R01LM013240) in 2020.

Share

COinS