Week 3 Blog: Playing with DataFrames—Correcting Errors and Analyzing Fictional Poll Results
Introduction
This week's assignment involved working with a fictional dataset representing poll results from the 2016 U.S. Presidential election. The dataset included seven candidates and their results from two polling sources: ABC and CBS. However, before diving into the analysis, I encountered a few errors in the initial dataset that I had to correct.
Initial Errors and Corrections
The data I was provided had some formatting issues that needed fixing before I could proceed. For example:
Quotation Marks: Some of the names had incorrect quotation marks (
“ ”instead of"), which R couldn’t recognize. I fixed this by replacing all smart quotes with regular quotation marks (").Commas: The ABC and CBS poll results had some missing commas. Without proper separation, R would misinterpret the values. I inserted the missing commas to correctly format the vectors.
Here’s how the corrected data looked:
Analyzing the Data
After cleaning the data, I created a data frame to store the candidate names and their respective poll results from ABC and CBS:
Next, I wrote code to determine the winners of each poll by identifying the candidates with the highest number of votes. Initially, I wrote a more manual approach, but after improving it with some help, I made the code more efficient.
Here’s the code I used to identify the winners of both polls:
Results
Using the corrected data and the improved code, I found the following results:
Donald won both polls decisively based on the fictional data provided.
Conclusion
This assignment helped me improve my data cleaning skills and practice handling issues in R such as formatting errors and missing separators. It also reinforced how to efficiently analyze data using functions like which.max() to determine the top performers in a dataset.
You can view the full code and project on my GitHub repository - (Named -assignment3).





Comments
Post a Comment