Author ORCID Identifier

https://orcid.org/0009-0003-8496-5271

Date Available

4-24-2024

Year of Publication

2024

Document Type

Master's Thesis

Degree Name

Master of Computer and Information Science (MCIS)

College

Engineering

Department/School/Program

Computer Science

Faculty

Dr. Ramakanth Kavuluru

Faculty

Dr. Simone Silvestri

Abstract

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and significantly lag pipeline models. Our results also hold for a second E2ERE dataset for chemical-protein interactions.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.77

Funding Information

This study was supported by the National Institutes of Health Grant (no.: R01LM013240) in 2020.

Recommended Citation

Gupta, Shashank, "LANGUAGE MODELS FOR RARE DISEASE INFORMATION EXTRACTION: EMPIRICAL INSIGHTS AND MODEL COMPARISONS" (2024). Theses and Dissertations--Computer Science. 142.
https://uknowledge.uky.edu/cs_etds/142

Download

Included in

Artificial Intelligence and Robotics Commons, Bioinformatics Commons, Data Science Commons, Engineering Commons, Software Engineering Commons

COinS

Theses and Dissertations--Computer Science

LANGUAGE MODELS FOR RARE DISEASE INFORMATION EXTRACTION: EMPIRICAL INSIGHTS AND MODEL COMPARISONS

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Funding Information

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Computer Science

LANGUAGE MODELS FOR RARE DISEASE INFORMATION EXTRACTION: EMPIRICAL INSIGHTS AND MODEL COMPARISONS

Author

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Funding Information

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect