Author ORCID Identifier

https://orcid.org/0000-0002-1907-139X

Date Available

4-17-2023

Year of Publication

2023

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Arts and Sciences

Department/School/Program

Mathematics

Advisor

Dr. Qiang Ye

Abstract

Normalization methods have proven to be an invaluable tool in the training of deep neural networks. In particular, Layer and Batch Normalization are commonly used to mitigate the risks of exploding and vanishing gradients. This work presents two methods which are related to these normalization techniques. The first method is Batch Normalized Preconditioning (BNP) for recurrent neural networks (RNN) and graph convolutional networks (GCN). BNP has been suggested as a technique for Fully Connected and Convolutional networks for achieving similar performance benefits to Batch Normalization by controlling the condition number of the Hessian through preconditioning on the gradients. We extend this work by applying it to Recurrent Neural Networks and Graph Convolutional Networks, two architectures which are prone to high computational costs and therefore benefit from the training acceleration provided by BNP. The second method is Assorted-Time Normalization (ATN). ATN is a normalization technique designed for use in sequential problems. It combines information from the hidden layers of the model with temporal data across the sequence dimension, which remedies a weakness of Layer Normalization in these applications.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2023.090

Funding Information

National Science Foundation Grant DMS-1821144 2021

Share

COinS