Cereal Correlation
Tejas Trivedi
Co-Presenters: Individual Presentation
College: College of Business and Public Management
Major: Public Administration (MPA)
Faculty Research Mentor: Yeonkyung Kim
Abstract:
The breakfast cereal data is a multivariate dataset consisting of seventy-seven commonly available breakfast cereals, based on information available on the mandated FDA food label. What do you get when you have a bowl of cereal? Can you get a lot of fiber without a lot of calories? Why are certain cereals at different shelf levels; bottom, middle & top? What are the possible correlations between calories, sugars, protein, carbs and consumer ratings? The good news is none of the cereals in the data collected have any cholesterol and manufacturers at that point rarely used artificial sweeteners.The manufacturers in data set are American Home Food Products, General Mills, Kellogg's, Nabisco, Post, Quaker Oats & Ralston Purina (now known as Nestle). This dataset was selected due to its familiarity in day-to-day aspects. Breakfast, as we have all been instilled, is a vital part to start the day. Most of us at one age consumed/may still consume this popular option. I wanted to explore some of the correlations between the brands, manufacturers, health options, marketing perspective & consumer ratings. Having a familiar aspect being analyzed from a data analysis lens; I believe will provide a better understanding on this topic.This dataset consists of 16 columns (name, manufacturer, type, calories, protein, fat, sodium, fiber, carbs, sugar, potassium, vitamins, shelf, weight, cup & consumer rating) with 77 rows of different brands. File type is in CSV format. Data was collected in 1993, yet a lot of the brands are still familiar household names today.The data did have to get cleaned before proper analysis can occur. Example, few of the carbs & protein figures were in decimals & had to be analyzed & formatted to whole numbers. Once in whole numbers, we can set up the proper pivot table to analyze the data. The data was also available in other formats such as pdf/xls however CSV would be ideal which was recognized while gathering the dataset.The original dataset was conducted & collected by Petra Isenberg, Pierre Dragicevic & Yvonne Johnson at Carnegie Melon University. The CSV file was downloaded from https://perso.telecom-paristech.fr/eagan/class/igr204/datasets.Citation: Isenberg, Petra, et al. Project Datasets, 1993, perso.telecom-paristech.fr/eagan/class/igr204/datasets.