Evaluating Automatic Speech Recognition​ Models: How Well Do They Handle Accents?​

Wei Liu

Co-Presenters: Yukun Xiong

College: The Dorothy and George Hennings College of Science, Mathematics and Technology

Major: Computer Science

Faculty Research Mentor: Daehan Kwak

Abstract:

The development of Automatic Speech Recognition (ASR)technology has progressed remarkably, becoming an integral component of virtual assistants, transcription services, and accessibility tools. Despite these advancements, ASR systems still struggle to accurately recognize speech from individuals with different accents and linguistic features. This work analyzes the performance of various ASR models, including cloud-based, local, and integrated speech recognition systems. For evaluation, we use different accented speech datasets and assess the ASR variants using Word Error Rate (WER) as the primary metric. The datasets include the Speech Accent Archive (SAA), L2-ARCTIC, and an Indian accent dataset. The results show that ASR accuracy varies depending on the speaker’s language and accent. OpenAI Whisper, Deepgram, and AssemblyAI perform significantly better compared to conventional models like Mozilla DeepSpeech. The results indicate that many standalone ASR models are optimized for non-regional standard English, leading to higher error rates for non-native and regionally accented speech. Future developments should focus on augmenting multilingual datasets and refining algorithms to achieve more equitable speech recognition capabilities for diverse accents.

Previous
Previous

Automated Alumni Employment Tracking System Utilizing LinkedIn Data​

Next
Next

Generating Student Curriculum Plans with Local Large Language Models