Author ORCID Identifier

https://orcid.org/0000-0002-0981-708X

Date Available

11-20-2020

Year of Publication

2020

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Electrical and Computer Engineering

Faculty

Dr. Michael T. Johnson

Faculty

Dr. Daniel Lau

Abstract

Acoustic-to-Articulatory Inversion, the estimation of articulatory kinematics from speech, is an important problem which has received significant attention in recent years. Estimated articulatory movements from such models can be used for many applications, including speech synthesis, automatic speech recognition, and facial kinematics for talking-head animation devices. Knowledge about the position of the articulators can also be extremely useful in speech therapy systems and Computer-Aided Language Learning (CALL) and Computer-Aided Pronunciation Training (CAPT) systems for second language learners. Acoustic-to-Articulatory Inversion is a challenging problem due to the complexity of articulation patterns and significant inter-speaker differences. This is even more challenging when applied to non-native speakers without any kinematic training data. This dissertation attempts to address these problems through the development of up-graded architectures for Articulatory Inversion. The proposed Articulatory-WaveNet architecture is based on a dilated causal convolutional layer structure that improves the Acoustic-to-Articulatory Inversion estimated results for both speaker-dependent and speaker-independent scenarios. The system has been evaluated on the ElectroMagnetic Articulography corpus of Mandarin Accented English (EMA-MAE) corpus, consisting of 39 speakers including both native English speakers and Mandarin accented English speakers. Results show that Articulatory-WaveNet improves the performance of the speaker-dependent and speaker-independent Acoustic-to-Articulatory Inversion systems significantly compared to the previously reported results.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2020.466

Funding Information

National Science Foundation CISE Directorate,“RI: Small: Speaker Independent Acoustic-Articulator Inversion for Pronunciation Assessment”, Total Budget $449,643. (Awarded 2013, finished in I think 2018 after some extensions.)
The University of Kentucky, Research Assistantship Fall 2016- Summer 2020.
The Partially the University of Kentucky, Department of Electrical and Computer Engineering, Teacher Assistantship, Fall 2020.
Partially University of Kentucky, Research Assistantship, Fall 2020.

Recommended Citation

Agha Seyed Mirza Bozorg, Narjes Alsadat, "Articulatory-WaveNet: Deep Autoregressive Model for Acoustic-to-Articulatory Inversion" (2020). Theses and Dissertations--Electrical and Computer Engineering. 159.
https://uknowledge.uky.edu/ece_etds/159

Download

Included in

Signal Processing Commons

COinS

Theses and Dissertations--Electrical and Computer Engineering

Articulatory-WaveNet: Deep Autoregressive Model for Acoustic-to-Articulatory Inversion

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Funding Information

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Electrical and Computer Engineering

Articulatory-WaveNet: Deep Autoregressive Model for Acoustic-to-Articulatory Inversion

Author

Author ORCID Identifier

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Funding Information

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect