AI-Based Emotion Detection via Text
Ricardo Urbaez
Co-Presenters: Individual Presentation
College: Hennings College of Science Mathematics and Technology
Major: BS.COMPUTER/SCI
Faculty Research Mentor: Martin Kollapally, Navya
Abstract:
The rise of user-generated text has created large volumes of emotionally rich content. Understanding these emotions is essential for advancing mental health research (Calvo & D’Mello, 2010), improving conversational systems (Zhou et al., 2020), and promoting digital well-being (Mark & Czerwinski, 2023). Yet detecting emotions in everyday language is difficult due to sarcasm, slang, and uneven tone (Buechel & Hahn, 2018). This project examines whether modern language models can capture how people feel emotions in text.We developed a Python-based pipeline that collects Reddit posts, preprocesses text and applies large language models for classification. The system uses the GoEmotions dataset (Demszky et al., 2020), which defines 27 fine-grained emotions, as the foundation for evaluation. From more than two thousand posts, we created a benchmark of six hundred drawn from well-being subreddits such as r/depression and r/SuicideWatch. Each post was annotated with up to five labels following GoEmotions guidelines, providing a focused evaluation set for testing model accuracy in sensitive contexts. To ensure transparency, the pipeline also generates tables and figures to document outputs consistently.Preliminary results show that admiration, approval, and gratitude occur frequently, while pride and grief appear rarely. The model performs well on short, direct statements and often suggests reasonable secondary labels. However, it struggles with sarcasm, ambiguous expressions, and posts containing mixed emotional signals, often defaulting to neutral. These findings highlight both the promise and limitations of current language models in real-world use.The next stage involves a formal evaluation comparing predictions to the six hundred human-labeled posts. We will report precision, recall, micro and macro F1 scores, and confusion matrices. Additional testing will explore prompt variations, classification thresholds, and context window sizes. Once this baseline is established, we will apply the same protocol to other models, including Qwen, Claude, and Gemini, for meaningful comparisons.In summary, our findings suggest that while language models can capture many emotional patterns from real-world text, ambiguity and rare categories remain challenging. This project offers a balanced view of the capabilities and limits of emotion detection, with implications for mental health, digital well-being, and human–computer interaction.