Author ORCID Identifier

https://orcid.org/0000-0001-7207-5149

Date Available

12-19-2023

Year of Publication

2023

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Engineering

Department/School/Program

Computer Science

First Advisor

Brent Harrison

Second Advisor

Nathan Jacobs

Abstract

Attention mechanism, an approach to maintain the local and global features over the input, is the crucial element of the Transformer. This dissertation explores structured attention for image analysis, proposing attention-based methods for multi-label learning and Alzheimer’s Disease (AD) diagnosis.
For the multi-label learning task, I present two works under the Vision Transformer (ViT) framework. The first work focuses on supervised learning of multi-label classification. I address the problems of the multi-label classification and propose a model named AssocFormer, which adopts the association module to access the objects’ association relationship to improve the model performance. The second work addresses the semi-supervised learning of multi-label classification. I work on Single-Positive Multi-Label Learning (SPML), an extremely challenging task in which only one positive label is known with the rest annotations unknown. I present VLPL, a novel and efficient frame-work that leverages the similarity of the visual and text embeddings to get the pseudo-label of the given image.
In the context of AD diagnosis, this study works on two tasks. The first task centers on efficient training using 3D brain images of AD. A novel module is proposed, which transforms 3D brain images into 2D fused images across the slice dimension. This conversion reduces input image dimensions, enhancing training efficiency. The second work combines different positron emission tomography (PET) modalities under the ViT Structure for AD diagnosis, namely ADViT.
Throughout my work, a collection of novel methods rooted in the attention framework is proposed. The results demonstrate the significant enhancements of these methods in computer vision and medical imaging analysis.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2023/477

Share

COinS