Author ORCID Identifier
Date Available
5-10-2023
Year of Publication
2023
Degree Name
Master of Arts in Linguistic Theory and Typology (MALTT)
Document Type
Master's Thesis
College
Arts and Sciences
Department/School/Program
Linguistic Theory & Typology
First Advisor
Dr. Josef Fruehwald
Abstract
One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Ćavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural network—especially transformer-based—approaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying XLSR-53 to Northern Prinmi, a Tibeto-Burman Qiangic language spoken in Southwest China (Daudey & Pincuo, 2020).
Specifically, this thesis aims to answer two questions. First, is the XLSR-53 ASR model useful for first-pass transcription of oral art recordings from Northern Prinmi, an under-resourced tonal language? Second, does preprocessing target transcripts to combine grapheme clusters—multi-character representations of lexical tones and characters with modifying diacritics—into more phonologically salient units improve the model's predictions? Results indicate that—with substantial adaptations—XLSR-53 will be useful for this task, and that preprocessing to combine grapheme clusters does improve model performance.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2023.202
Recommended Citation
Bechler, Connor, "Automatic Transcription of Northern Prinmi Oral Art: Approaches and Challenges to Automatic Speech Recognition for Language Documentation" (2023). Theses and Dissertations--Linguistics. 51.
https://uknowledge.uky.edu/ltt_etds/51