Electrical and Computer Engineering Faculty Publications

Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition

Jian Kang, Tsinghua University, China
Wei-Qiang Zhang, Tsinghua University, ChinaFollow
Wei-Wei Liu, Chinese People’s Liberation Army, China
Jia Liu, Tsinghua University, China
Michael T. Johnson, University of KentuckyFollow

Abstract

Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem and have achieved excellent results. Bidirectional LSTM (BLSTM), which uses both preceding and following context, has shown particularly good performance. However, the computational requirements of BLSTM approaches are quite heavy, even when implemented efficiently with GPU-based high performance computers. In addition, because the output of LSTM units is bounded, there is often still a vanishing gradient issue over multiple layers. The large size of LSTM networks makes them susceptible to overfitting problems. In this work, we combine local bidirectional architecture, a new recurrent unit, gated recurrent units (GRU), and residual architectures to address the above problems. Experiments are conducted on the benchmark datasets released under the IARPA Babel Program. The proposed models achieve 3 to 10% relative improvements over their corresponding DNN or LSTM baselines across seven language collections. In addition, the new models accelerate learning speed by a factor of more than 1.6 compared to conventional BLSTM models. By using these approaches, we achieve good results in the IARPA Babel Program.

Document Type

Article

Publication Date

7-17-2018

Notes/Citation Information

Published in EURASIP Journal on Audio, Speech, and Music Processing, v. 6, p. 1-15.

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Digital Object Identifier (DOI)

https://doi.org/10.1186/s13636-018-0128-6

Funding Information

This work is supported by National Natural Science Foundation of China under Grant No. 61370034 and No. 61403224.

Repository Citation

Kang, Jian; Zhang, Wei-Qiang; Liu, Wei-Wei; Liu, Jia; and Johnson, Michael T., "Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition" (2018). Electrical and Computer Engineering Faculty Publications. 23.
https://uknowledge.uky.edu/ece_facpub/23

Download

Included in

Computational Engineering Commons, Computational Linguistics Commons, Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

Electrical and Computer Engineering Faculty Publications

Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Related Content

Repository Citation

Included in

Search

Browse by Author

Author Corner

Connect

Electrical and Computer Engineering Faculty Publications

Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition

Authors

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Related Content

Repository Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect