Date Available
12-8-2025
Year of Publication
2025
Document Type
Doctoral Dissertation
Degree Name
Doctor of Philosophy (PhD)
College
Engineering
Department/School/Program
Computer Science
Faculty
Ramakanth Kavuluru
Faculty
Simone Silvestri
Abstract
Biomedical relation extraction is a key task in information extraction supporting downstream applications such as knowledge discovery, information retrieval, and question answering. As such, improvements in relation extraction (RE) are expected to have major implications for other information needs in biomedicine. In this dis- sertation, we pursue biomedical relation extraction in several advanced settings with recent advances in large language models (LLMs) and retrieval augmented generation (RAG). First, we address an important slot filling task on social media that arose during the Covid-19 pandemic, using the idea of continuous prompts with encoder models. Next, we handle a tricky dynamic n-ary relation extraction setting for ex- tracting combination drug therapies using the so called Seq2Rel paradigm. We then show how predicate descriptions can help with relation extraction using a bi-encoder setup leading to improvements in both relation classification and end-to-end config- urations. Since RE tends to be a local extraction task exclusively focused on input text, it is not straightforward to imbue external knowledge without adversely affecting the extraction accuracy. We handle this in a fourth project by augmenting retrieved chunks of text that contribute to the RE without overwhelming the signal in the orig- inal inputs. Finally, we show that reasoning traces distilled from open source large reasoning models (LRMs) can enhance RE performance even when traces are ob- tained in an unsupervised manner where the gold label is not available. This project demonstrates that unlabeled instances can still contribute to RE improvements even when some of the zero-shot extractions are incorrect. Overall, this dissertation makes multiple contributions to further methodological innovations in biomedical RE.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2025.558
Funding Information
This research was supported by the National Library of Medicine of the National Institutes of Health under Award Number R01 LM013240 during the years 2021–2025.
Recommended Citation
Jiang, Yuhang, "Biomedical Relation Extraction in the Era of (Large) Language Models" (2025). Theses and Dissertations--Computer Science. 153.
https://uknowledge.uky.edu/cs_etds/153
