COVID 19: Retelling the Story Through Topic Modeling and Sentiment Analysis
Emmanuel Babayemi
Co-Presenters: Individual Presentation
College: The Dorothy and George Hennings College of Science, Mathematics and Technology
Major: Computer Science
Faculty Research Mentor: Daehan Kwak
Abstract:
The COVID-19 pandemic, spanning late 2019 to late 2022, caused global disruption, resulting in over 700 million infections and 7 million deaths, while also greatly impacting economies, healthcare systems, and social life. During this period, social media sites such as X (formerly Twitter) became critical spaces for people to voice their thoughts and opinions concerning the pandemic. The objective of this study is to analyze the evolution of public sentiment and key topics over time using text data from X related to COVID-19. This will be achieved through topic modeling and sentiment analysis, creating a data-driven timeline of public opinions during the pandemic.The first phase of this study is the data collection phase, in which a containerized web scraping Python program will hydrate tweet IDs from a database to collect millions of COVID-19-related tweets. The second phase focuses on understanding the data through topic modeling and sentiment analysis. Topic modeling will utilize LDA and BERT to identify key topics people discussed and how they developed and changed over time. Sentiment analysis will utilize Vader and Stanza to determine whether people felt positive, negative, or neutral regarding the pandemic and other subtopics.The results of this study will shed light on societal facets such as adherence to social distancing, utilization of healthcare systems, job security, and mask usage that will educate decisions of policymakers and public health officials in times of crisis and provide insights for predicting real-world events related to pandemics (e.g. an economic recession, case spikes). Additionally, this study provides a framework for conducting analyses on social media’s informal writing style, advancing machine learning and data science techniques for unstructured text analysis.