Theses and Dissertations--Computer Science

APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION

Yuan Lu, University of KentuckyFollow

Date Available

2-12-2015

Year of Publication

2015

Degree Name

Master of Science (MS)

Document Type

Master's Thesis

College

Engineering

Department/School/Program

Computer Science

First Advisor

Dr. Ramakanth Kavuluru

Abstract

Many manual biomedical annotation tasks can be categorized as instances of the typical multi-label classification problem where several categories or labels from a fixed set need to assigned to an input instance. MeSH term assignment to biomedical articles and diagnosis code extraction from medical records are two such tasks. To address this problem automatically, in this thesis, we present a way to utilize latent associations between labels based on output label sets. We used random indexing as a method to determine latent associations and use the associations as a novel feature in a learning-to-rank algorithm that reranks candidate labels selected based on either k-NN or binary relevance approach. Using this new feature as part of other features, for MeSH term assignment, we train our ranking model on a set of 200 documents, test it on two public datasets, and obtain new state-of-the-art results in precision, recall, and mean average precision. In diagnosis code extraction, we reach an average micro F-score of 0.478 based on a large EMR dataset from the University of Kentucky Medical Center, the first study of its kind to our knowledge. Our study shows the advantages and potential of random indexing method in determining and utilizing implicit relationships between labels in multi-label classification problems.

Recommended Citation

Lu, Yuan, "APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION" (2015). Theses and Dissertations--Computer Science. 30.
https://uknowledge.uky.edu/cs_etds/30

Download

Included in

Other Computer Engineering Commons

COinS

Theses and Dissertations--Computer Science

APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION

Date Available

Year of Publication

Degree Name

Document Type

College

Department/School/Program

First Advisor

Abstract

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Computer Science

APPLICATION OF RANDOM INDEXING TO MULTI LABEL CLASSIFICATION PROBLEMS: A CASE STUDY WITH MESH TERM ASSIGNMENT AND DIAGNOSIS CODE EXTRACTION

Author

Date Available

Year of Publication

Degree Name

Document Type

College

Department/School/Program

First Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect