Evaluating Automatic Speech Recognition Models: How Well Do They Handle Accents?
Wei Liu
Co-Presenters: Yukun Xiong
College: The Dorothy and George Hennings College of Science, Mathematics and Technology
Major: Computer Science
Faculty Research Mentor: Daehan Kwak
Abstract:
The development of Automatic Speech Recognition (ASR)technology has progressed remarkably, becoming an integral component of virtual assistants, transcription services, and accessibility tools. Despite these advancements, ASR systems still struggle to accurately recognize speech from individuals with different accents and linguistic features. This work analyzes the performance of various ASR models, including cloud-based, local, and integrated speech recognition systems. For evaluation, we use different accented speech datasets and assess the ASR variants using Word Error Rate (WER) as the primary metric. The datasets include the Speech Accent Archive (SAA), L2-ARCTIC, and an Indian accent dataset. The results show that ASR accuracy varies depending on the speaker’s language and accent. OpenAI Whisper, Deepgram, and AssemblyAI perform significantly better compared to conventional models like Mozilla DeepSpeech. The results indicate that many standalone ASR models are optimized for non-regional standard English, leading to higher error rates for non-native and regionally accented speech. Future developments should focus on augmenting multilingual datasets and refining algorithms to achieve more equitable speech recognition capabilities for diverse accents.