Automatic Speech Recognition for Low-Resource Pediatric Speech

Jonathan Argudo

Co-Presenters: Individual Presentation

College: Hennings College of Science Mathematics and Technology

Major: BS.COMPSCI/CYBERS

Faculty Research Mentor: Martin Kollapally, Navya  

Abstract:

Automatic Speech Recognition (ASR) systems have shown great accuracy on high-resource languages, but their applicability on low-resource languages is a very important area of research. This paper aims to explore whether the acoustic modeling advantage shown by transformer-based models on high-resource languages translates well to low-resource languages. This is done by performing a comparative study of three ASR models: OpenAI Whisper, a transformer-based model; Meta Wav2Vec2, a Connectionist Temporal Classification-based model; and Speech Brain, a hybrid model. To make sure that the study has a representative sample of languages, these models are tested on five languages with different levels of resource availability: Haitian and Creole, low-resource languages; Mandarin Chinese, a medium-resource language with tonal variations; and Spanish and Portuguese, high-resource languages. The paper uses a multi-dimensional approach to evaluate the models, using four important metrics. In addition to the traditional Word Error Rate (WER) metric, Verb/Noun Ratio is used for linguistic quality evaluation, Average Utterance Length for structural integrity analysis, and Real-Time Factor (RTF) for computational complexity analysis. The transformer-based models show greater robustness, indicating that self-attention mechanisms might be cleverer at identifying cues in low-resource linguistic communities.

Previous
Previous

The Effects of Coffee Fruit Extract and Its Bioactive Components on Fat Metabolism in Caenorhabditis elegans

Next
Next

Opportunities for Females in Adult Competitive Sport: A Scoping Review