Author ORCID Identifier

Year of Publication


Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation


Public Health


Epidemiology and Biostatistics

First Advisor

Dr. Anna M. Kucharska-Newton

Second Advisor

Dr. Erin L. Abner


In the United States, the prevalence of long-term exposure to opioid drugs, for both medically and nonmedically indicated purposes, has increased considerably since the mid-1990’s. Concerns have emerged about the potential health effects of opioid use. There is also growing interest in other possible connections with opioid use including cardiovascular disease. Electronic health records (EHR) contain information about patient care in the form of structured codes and unstructured notes. Natural language processing (NLP) provides a tool for processing unstructured textual data in EHR clinical notes and extracts useful information for research with structured formats. The purpose of this dissertation was to 1) to summarize peer-reviewed literature on the association between non-acute opioid and cardiovascular disease (CVD) and identify the gap of this research topic; 2) to apply NLP algorithm to estimate the extent of opioid use disorder (OUD) among hospital inpatients that cannot be identified using ICD-10-CM codes; and 3) to determine the extent to which estimates of the association between OUD and CVD may be biased by misclassification of OUD cases that are not identifiable using ICD-10-CM codes.

First, we conducted a scoping review of the epidemiological literature on nonacute opioid use and CVD. We summarized the current evidence about the association between NOU and CVD, and identified some open questions on this topic. Then, we developed a Natural Language Processing algorithm to identify cases of OUD in electronic healthcare records that were not assigned an ICD-10-CM code for OUD by medical records coders, but for which strong evidence of OUD exists in the unstructured clinical notes. Lastly, we estimated the association between OUD and six types of CVD, arrhythmia, myocardial infarction, stroke, heart failure, ischemic heart disease, and infective endocarditis, classifying OUD in two ways: defining OUD cases by ICD-10-CM codes alone, and using a combination of cases identified by ICD-10-CM codes and cases identified using NLP algorithm. We assessed the effect of misclassification of OUD status when using ICD-10-CM codes alone.

Digital Object Identifier (DOI)