Automatic Speech Recognition for Low-Resource Pediatric Speech
Jonathan Argudo
Co-Presenters: Individual Presentation
College: Hennings College of Science Mathematics and Technology
Major: BS.COMPSCI/CYBERS
Faculty Research Mentor: Martin Kollapally, Navya
Abstract:
Automatic Speech Recognition (ASR) systems have shown great accuracy on high-resource languages, but their applicability on low-resource languages is a very important area of research. This paper aims to explore whether the acoustic modeling advantage shown by transformer-based models on high-resource languages translates well to low-resource languages. This is done by performing a comparative study of three ASR models: OpenAI Whisper, a transformer-based model; Meta Wav2Vec2, a Connectionist Temporal Classification-based model; and Speech Brain, a hybrid model. To make sure that the study has a representative sample of languages, these models are tested on five languages with different levels of resource availability: Haitian and Creole, low-resource languages; Mandarin Chinese, a medium-resource language with tonal variations; and Spanish and Portuguese, high-resource languages. The paper uses a multi-dimensional approach to evaluate the models, using four important metrics. In addition to the traditional Word Error Rate (WER) metric, Verb/Noun Ratio is used for linguistic quality evaluation, Average Utterance Length for structural integrity analysis, and Real-Time Factor (RTF) for computational complexity analysis. The transformer-based models show greater robustness, indicating that self-attention mechanisms might be cleverer at identifying cues in low-resource linguistic communities.