Author ORCID Identifier
Date Available
4-24-2024
Year of Publication
2024
Degree Name
Master of Computer and Information Science (MCIS)
Document Type
Master's Thesis
College
Engineering
Department/School/Program
Computer Science
First Advisor
Dr. Ramakanth Kavuluru
Abstract
End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2024.77
Funding Information
This study was supported by the National Institutes of Health Grant (no.: R01LM013240) in 2020.
Recommended Citation
Gupta, Shashank, "LANGUAGE MODELS FOR RARE DISEASE INFORMATION EXTRACTION: EMPIRICAL INSIGHTS AND MODEL COMPARISONS" (2024). Theses and Dissertations--Computer Science. 142.
https://uknowledge.uky.edu/cs_etds/142
Included in
Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Data Science Commons, Engineering Commons, Software Engineering Commons