Author ORCID Identifier

https://orcid.org/0000-0002-1417-5042

Date Available

5-23-2022

Year of Publication

2022

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Arts and Sciences

Department/School/Program

Mathematics

Advisor

Dr. Qiang Ye

Abstract

Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this work, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss function and hence convergence during training. One benefit is that BNP is not constrained on the mini-batch size and works in the online learning setting. We also extend this technique to Bayesian neural networks which are networks that have probability distributions corresponding to the weights and biases instead of single fixed values. In particular, we apply BNP to stochastic gradient Langevin dynamics (SGLD), which is a standard sampling technique for uncertainty estimation in Bayesian neural networks.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2022.182

Recommended Citation

Lange, Susanna Luisa Gertrude, "Batch Normalization Preconditioning for Neural Network Training" (2022). Theses and Dissertations--Mathematics. 88.
https://uknowledge.uky.edu/math_etds/88

Download

Included in

Data Science Commons, Other Mathematics Commons

COinS

Theses and Dissertations--Mathematics

Batch Normalization Preconditioning for Neural Network Training

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Advisor

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Mathematics

Batch Normalization Preconditioning for Neural Network Training

Author

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Advisor

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect