Year of Publication
Master of Science in Electrical Engineering (MSEE)
Dr. Sen-ching Samson Cheung
Voice transformation refers to a class of techniques that modify the voice characteristics either to conceal the identity or to mimic the voice characteristics of another speaker. Its applications include automatic dialogue replacement and voice generation for people with voice disorders. The diversity in applications makes evaluation of voice transformation a challenging task. The objective of this research is to propose a framework to evaluate intentional voice transformation techniques. Our proposed framework is based on two fundamental qualities: intelligibility and speaker similarity. Intelligibility refers to the clarity of the speech content after voice transformation and speaker similarity measures how well the modified output disguises the source speaker. We measure intelligibility with word error rates and speaker similarity with likelihood of identifying the correct speaker. The novelty of our approach is, we consider whether similarly transformed training data are available to the recognizer. We have demonstrated that this factor plays a significant role in intelligibility and speaker similarity for both human testers and automated recognizers. We thoroughly test two classes of voice transformation techniques: pitch distortion and voice conversion, using our proposed framework. We apply our results for patients with voice hypertension using video self-modeling and preliminary results are presented.
Raghunathan, Anusha, "EVALUATION OF INTELLIGIBILITY AND SPEAKER SIMILARITY OF VOICE TRANSFORMATION" (2011). University of Kentucky Master's Theses. 101.