Skip to main content

Week 3 Blog: Playing with DataFrames—Correcting Errors and Analyzing Fictional Poll Results

 

Week 3 Blog: Playing with DataFrames—Correcting Errors and Analyzing Fictional Poll Results





Introduction

This week's assignment involved working with a fictional dataset representing poll results from the 2016 U.S. Presidential election. The dataset included seven candidates and their results from two polling sources: ABC and CBS. However, before diving into the analysis, I encountered a few errors in the initial dataset that I had to correct.

Initial Errors and Corrections

The data I was provided had some formatting issues that needed fixing before I could proceed. For example:

  1. Quotation Marks: Some of the names had incorrect quotation marks (“ ” instead of "), which R couldn’t recognize. I fixed this by replacing all smart quotes with regular quotation marks (").

  2. Commas: The ABC and CBS poll results had some missing commas. Without proper separation, R would misinterpret the values. I inserted the missing commas to correctly format the vectors.

Here’s how the corrected data looked:






Analyzing the Data

After cleaning the data, I created a data frame to store the candidate names and their respective poll results from ABC and CBS:






Next, I wrote code to determine the winners of each poll by identifying the candidates with the highest number of votes. Initially, I wrote a more manual approach, but after improving it with some help, I made the code more efficient.

Here’s the code I used to identify the winners of both polls:








Results

Using the corrected data and the improved code, I found the following results:


Donald won both polls decisively based on the fictional data provided.

Conclusion

This assignment helped me improve my data cleaning skills and practice handling issues in R such as formatting errors and missing separators. It also reinforced how to efficiently analyze data using functions like which.max() to determine the top performers in a dataset.

You can view the full code and project on my GitHub repository - (Named -assignment3).




Comments

Popular posts from this blog

DNA Sequence Alignment and Visualization with "SequenceAlignment" Package

 DNA Sequence Alignment and Visualization with "SequenceAlignment" Package In bioinformatics, sequence alignment plays a crucial role in comparing biological sequences, especially DNA sequences. It helps in identifying similarities, differences, and evolutionary relationships between sequences. In this blog, we’ll explore how to use the SequenceAlignment R package for performing sequence alignments, visualizing the results with plots like barplots and heatmaps , and analyzing DNA sequences against multiple reference sequences stored in FASTA files. What is Sequence Alignment? Sequence alignment is the process of comparing two or more biological sequences (e.g., DNA, RNA, or proteins) to identify regions of similarity or difference. In DNA sequence alignment, the sequences are compared to see how closely they match, which can provide insights into genetic similarities, mutations, or evolutionary trends. The SequenceAlignment Package The SequenceAlignment package is a powerf...

Journey Through R Programming: Week 1

  Journey Through R Programming: Week 1 Introduction Welcome to my blog! As part of my Open Source R course with Professor Alon Friedman at the University of South Florida, I’m excited to document my weekly progress in learning R programming. A bit about me: I’m currently pursuing a Master’s in Bioinformatics & Computational Biology, following an undergrad in Biotechnology. My programming journey began with Python through the “100 Days of Code: The Complete Python Pro Bootcamp” on Udemy, which included around 8 mini projects. This experience has made transitioning to R a bit smoother, as many concepts overlap. To support my learning, I’m using the book  The Art of R Programming  and the edX course  Data Science: R Basics  from Harvard University. These resources have been invaluable in deepening my understanding of R. Summary 1. Function Creation Objective: Create a function to count the number of odd numbers in a vector. Code: What I Learned: The modulus op...

Week 5: Matrix Operations and a bit of Data Manipulation

  Week 5: Matrix Operations and a bit of Data Manipulation  Objective This week’s assignment focused on matrix operations, specifically finding the inverse and determinant of matrices. Additionally, I explored some data manipulation techniques, which I will summarize below. Part 1: Matrix Operations Matrix Creation I created two matrices: Matrix A : A 10x10 matrix containing values from 1 to 100. Matrix B : A 10x100 matrix containing values from 1 to 1000. Determinant and Inverse of Matrix A The   determinant   of   Matrix A   was found to be   0 , indicating that it is   singular   and does not have an inverse. Matrix B   is non-square (10x100), so it cannot have a determinant or an inverse. Error Handling When attempting to compute the inverse of   Matrix A   using the   solve()   function, R returned an error indicating that the matrix is singular. This message means that the matrix does not have an inverse because...