Author ORCID Identifier

https://orcid.org/0000-0002-9893-4084

Date Available

8-20-2026

Year of Publication

2025

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Medicine

Department/School/Program

Clinical and Translational Science

Faculty

Daniel Harris

Faculty

Sharon Walsh

Faculty

Claire Clark

Abstract

The ongoing opioid overdose crisis in the United States requires timely and accurate surveillance systems to inform public health responses. Traditional public health surveillance methods rely on hospital discharge data and death certificates, which suffer from significant reporting delays and miss cases where patients refuse hospital transportation. Emergency Medical Services (EMS) data presents a promising alternative with advantages in timeliness and case ascertainment but lacks validated definitions for suspected opioid overdose (SOO).

This dissertation addresses this critical gap through the development, validation, and fairness assessment of machine learning models with natural language processing (ML-NLP) for identifying SOOs in EMS data. The research extends existing expert-driven knowledge-based (KB) definitions by leveraging rich narrative text in EMS records to improve classification accuracy while examining potential algorithmic bias across socioeconomic and demographic groups.

In the first study, a sample of 2,327 Kentucky EMS encounters from 2018-2022 underwent expert review to establish ground-truth SOO labels. Five established KB definitions were evaluated against novel ML-NLP approaches using random forest models. Results demonstrated that ML-NLP models outperformed KB definitions, with the full-featured model (combining structured and unstructured data) achieving the highest F1-score (0.81) compared to the best KB definition (0.77). The ML-NLP model demonstrated superior precision (0.82 vs. 0.69) while maintaining comparable sensitivity, underscoring the value of integrating domain-specific knowledge with advanced analytical techniques to enhance SOO surveillance.

The second study examined potential algorithmic bias across demographic groups and neighborhood social vulnerability index (SVI) quartiles. Using a demographically balanced dataset with oversampling of Black patients, various model designs were evaluated for fairness. While modest disparities were observed in classification performance, the SVI inclusive ML-NLP model (incorporating incident location's SVI) demonstrated the most balanced performance. Fairness metrics indicated minimal systemic bias in the optimized models, particularly when integrating both race and social vulnerability features. Notably, the performance gains over the highest-performing KB definition were relatively modest, suggesting well-designed expert-driven approaches remain viable alternatives to computationally intensive methods.

The dissertation concludes by exploring practical implementation, addressing technical infrastructure requirements, workforce training needs, and organizational barriers within public health agencies. A framework for responsible implementation balances improved surveillance capabilities with equity considerations.

This research makes three significant contributions: (1) establishing a methodological framework for validating EMS-based opioid overdose surveillance, (2) demonstrating performance advantages of ML-NLP over traditional rule-based approaches, and (3) providing fairness assessment methodologies essential for responsible implementation. The findings support the integration of advanced analytics into public health surveillance while emphasizing ongoing evaluation of algorithmic fairness in definition evaluation.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2025.417

Funding Information

This study was supported by the Centers for Disease Control and Prevention Overdose Data to Action Grant (no.: NU17CE924971-01-01) from 2019-2022,

the National Institutes of Health's National Institute on Drug Abuse Rapid Actionable Data for Opioid Response in Kentucky (no.:R01 DA057605-01) and

Associated supplemental grant focused on artificial intelligence fairness (no.:R01 DA057605-01S2)

Available for download on Thursday, August 20, 2026

Share

COinS