Author ORCID Identifier

https://orcid.org/0009-0006-6450-2837

Date Available

5-1-2025

Year of Publication

2025

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Arts and Sciences

Department/School/Program

Mathematics

Faculty

Dr. Qiang Ye

Abstract

This dissertation investigates novel architectures to address fundamental challenges in machine learning, particularly focusing on transformer models, recurrent neural networks, GAN and continual learning and their applications in natural language processing and computer vision. We propose the Neumann-Cayley Gated Recurrent Unit (NC-GRU), which leverages a Neumann series-based Scaled Cayley transformation to maintain orthogonal weight matrices, effectively mitigating exploding gradients problems while improving long-term memory retention across prediction tasks. We demonstrate the practical applications of NC-GRU by implementing our proposed architecture into an autoencoder to derive neural molecular fingerprints. Building upon these advancements, we turn our attention to the transformer architecture, which has revolutionized natural language processing since its introduction in 2017 by achieving state-of-the-art performance across numerous tasks. However, these models face significant scaling limitations when processing long sequences due to the quadratic computational complexity of self-attention mechanisms. To address this constraint, we propose the Compact Recurrent Transformer (CRT), an innovative architecture that efficiently combines shallow transformers for local context processing with recurrent neural networks to maintain a persistent memory vector that captures global information across sequence segments. The dissertation also explores generative modeling through a novel GRU-GAN architecture that generates topside welding images conditioned on bottom images, demonstrating that incorporating sequential information from multiple consecutive bottom images substantially improves the quality of generated outputs; this model is further refined to additionally utilize waveform information as conditioning components, resulting in enhanced image generation quality. Finally, we address the challenge of catastrophic forgetting in continual learning settings by introducing two gradient-based sample selection strategies: CRUST and cosine-CRUST, which effectively differentiate between clean and noisy data points by analyzing gradient distributions, enabling the model to retain critical information while adapting to new data; our methods optimize computational efficiency by focusing on the gradients of the network's final layer and employing clustering techniques to identify and preserve representative clean samples.

Digital Object Identifier (DOI)

https://doi.org/10.13023/vjn3-x807

Funding Information

This research is supported by National Science Foundation IIS-2327113 and DMS-2208314.

Share

COinS