Kentucky Cancer Registry Faculty Publications

Limitations of Transformers on Clinical Text Classification

Shang Gao, Oak Ridge National Laboratory
Mohammed Alawad, Oak Ridge National Laboratory
Michael Todd Young, Oak Ridge National Laboratory
John Gounley, Oak Ridge National Laboratory
Noah Schaefferkoetter, Oak Ridge National Laboratory
Hong-Jun Yoon, Oak Ridge National Laboratory
Xiao-Cheng Wu, Louisiana State University
Eric B. Durbin, University of KentuckyFollow
Jennifer Doherty, University of Utah
Antoinette Stroup, Rutgers Cancer Institute of New Jersey
Linda Coyle, Information Management Services Inc
Georgia D. Tourassi, Oak Ridge National Laboratory

Abstract

Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures -- a word-level convolutional neural network and a hierarchical self-attention network -- and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT -- pretraining and WordPiece tokenization -- may actually be inhibiting BERT's performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text.

Document Type

Article

Publication Date

2-26-2021

Notes/Citation Information

Published in IEEE Journal of Biomedical and Health Informatics.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

Digital Object Identifier (DOI)

https://doi.org/10.1109/JBHI.2021.3062322

Funding Information

This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52- 07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725 This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

Repository Citation

Gao, Shang; Alawad, Mohammed; Young, Michael Todd; Gounley, John; Schaefferkoetter, Noah; Yoon, Hong-Jun; Wu, Xiao-Cheng; Durbin, Eric B.; Doherty, Jennifer; Stroup, Antoinette; Coyle, Linda; and Tourassi, Georgia D., "Limitations of Transformers on Clinical Text Classification" (2021). Kentucky Cancer Registry Faculty Publications. 3.
https://uknowledge.uky.edu/kcr_facpub/3

Download

Included in

Computer Sciences Commons, Oncology Commons

COinS

Kentucky Cancer Registry Faculty Publications

Limitations of Transformers on Clinical Text Classification

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Repository Citation

Included in

Search

Browse by Author

Author Corner

Connect

Kentucky Cancer Registry Faculty Publications

Limitations of Transformers on Clinical Text Classification

Authors

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Repository Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect