Archived
This content is available here strictly for research, reference, and/or recordkeeping and as such it may not be fully accessible. If you work or study at University of Kentucky and would like to request an accessible version, please use the SensusAccess Document Converter.
Author ORCID Identifier
Date Available
4-24-2024
Year of Publication
2024
Document Type
Master's Thesis
Degree Name
Master of Computer and Information Science (MCIS)
College
Engineering
Department/School/Program
Computer Science
Faculty
Dr. Ramakanth Kavuluru
Faculty
Dr. Simone Silvestri
Abstract
End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2024.77
Funding Information
This study was supported by the National Institutes of Health Grant (no.: R01LM013240) in 2020.
Recommended Citation
Gupta, Shashank, "LANGUAGE MODELS FOR RARE DISEASE INFORMATION EXTRACTION: EMPIRICAL INSIGHTS AND MODEL COMPARISONS" (2024). Theses and Dissertations--Computer Science. 142.
https://uknowledge.uky.edu/cs_etds/142
Included in
Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Data Science Commons, Engineering Commons, Software Engineering Commons
