Kentucky Cancer Registry Faculty Publications

Using Case-Level Context to Classify Cancer Pathology Reports

Shang Gao, Oak Ridge National Laboratory
Mohammed Alawad, Oak Ridge National Laboratory
Noah Schaefferkoetter, Oak Ridge National Laboratory
Lynne Penberthy, National Cancer Institute
Xiao-Cheng Wu, Louisiana State University
Eric B. Durbin, University of KentuckyFollow
Linda Coyle, Information Management Services Inc.
Arvind Ramanathan, Argonne National Laboratory
Georgia Tourassi, Oak Ridge National Laboratory

Abstract

Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.

Document Type

Article

Publication Date

5-12-2020

Notes/Citation Information

Published in PLOS ONE, v. 15, no. 5, 0232840, p. 1-21.

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Digital Object Identifier (DOI)

https://doi.org/10.1371/journal.pone.0232840

Funding Information

Georgia Tourassi (GT) at the Oak Ridge National Laboratory received funding from the Department of Energy (energy.gov) and the National Cancer Institute (cancer.gov). The grant number is 2450-Z301-19. These funds were used to facilitate this study. The provided funding via this grant was used to support of salaries for SG, MA, NS, AR, and GT. In addition to the grant, the National Cancer Institute (NCI) employs or provides funding for authors from NCI (LP), state registries (XCW, and EBD) and Information Management Services Inc (LC) as part of the Surveillance, Epidemiology, and End Results program and authorized their participation in this study. Their efforts included data collection, cleaning, analysis or final review of this study. This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725 This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Repository Citation

Gao, Shang; Alawad, Mohammed; Schaefferkoetter, Noah; Penberthy, Lynne; Wu, Xiao-Cheng; Durbin, Eric B.; Coyle, Linda; Ramanathan, Arvind; and Tourassi, Georgia, "Using Case-Level Context to Classify Cancer Pathology Reports" (2020). Kentucky Cancer Registry Faculty Publications. 1.
https://uknowledge.uky.edu/kcr_facpub/1

pone.0232840.s001.tif (1409 kB)
S1 Fig. https://doi.org/10.1371/journal.pone.0232840.s001
pone.0232840.s002.tif (1394 kB)
S2 Fig. Histogram of number of pathology reports associated with each unique tumor ID. https://doi.org/10.1371/journal.pone.0232840.s002
pone.0232840.s003.pdf (80 kB)
S1 Table. McNemar’s tests of statistical significance. https://doi.org/10.1371/journal.pone.0232840.s003
pone.0232840.s004.pdf (78 kB)
S2 Table. Case-level context F-score breakdown by class. https://doi.org/10.1371/journal.pone.0232840.s004
pone.0232840.s005.pdf (73 kB)
S3 Table. Modular vs end-to-end training. https://doi.org/10.1371/journal.pone.0232840.s005

Download

Additional files available below

Included in

Computational Engineering Commons, Data Science Commons, Oncology Commons

COinS

Kentucky Cancer Registry Faculty Publications

Using Case-Level Context to Classify Cancer Pathology Reports

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Related Content

Repository Citation

Included in

Search

Browse by Author

Author Corner

Connect

Kentucky Cancer Registry Faculty Publications

Using Case-Level Context to Classify Cancer Pathology Reports

Authors

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Related Content

Repository Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect