Skip to main content

Week: 11 - Debugging Journey: Fixing the tukey_multiple Function

Week: 11 Debugging Journey- Fixing the tukey_multiple Function



Introduction

In this post, I’ll explain how I fixed a bug in the tukey_multiple function in R. The goal of this function is to find rows in a dataset where all the values are outliers. Initially, I ran into errors, but I eventually found the problem and fixed it. Here’s how I did it.

Understanding the tukey_multiple Function

The function tukey_multiple is supposed to:

  1. Check each column in a dataset to see if values are outliers.
  2. Return a list that tells us whether each row has only outliers across all columns.

Here’s the original code I started with:


Step 1: Initial Problem - Missing Function

When I tried to run the code, I got an error saying tukey.outlier wasn’t found. After checking, I realized that tukey.outlier isn’t a built-in function in R and wasn’t defined anywhere else in the code. This function is supposed to check if values in a column are outliers, so I knew I needed to write my own function to do that.

What I Learned

  • Check if functions exist before using them, especially if they’re not from R’s built-in functions or packages.

Step 2: Creating a Helper Function for Outliers

To fix the missing function, I created my own tukey_outlier function. This function detects outliers based on Tukey’s Rule, which states that:

  • Values below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR are outliers.
  • Here, Q1 is the first quartile (25th percentile), Q3 is the third quartile (75th percentile), and IQR is the interquartile range (Q3 - Q1).

Here’s my new helper function:



What I Learned

  • Understanding Tukey’s Rule helped me create a simple function to detect outliers without needing a special library.

Step 3: Fixing the Logical Error

Once I defined tukey_outlier, I ran into a new issue with this line of code:


The && operator didn’t work correctly here because && is used for single logical comparisons in R, not for checking every element in an array. So, I replaced && with a simpler check that applied tukey_outlier to each column directly:


What I Learned

  • Use & (element-wise comparison) instead of && when working with arrays in R.

Step 4: Improving Row-Wise Check for Outliers

Lastly, I wanted to check if each row had only outliers across all columns. Originally, I used a loop to check each row

To make this part simpler, I replaced the loop with apply(outliers, 1, all). This single line does the same thing, checking if every value in each row is an outlier

Final Solution

Here’s the final, corrected tukey_mutiple function:


Testing and Results

I tested the fixed function on the iris dataset, and it worked as expected. Running tukey_multiple(iris) gave me a vector that showed which rows had all outliers.

Conclusion

Through this debugging process, I learned:

  1. The importance of defining all functions used in code.
  2. How to apply Tukey’s Rule for finding outliers.
  3. The difference between & and && for logical operations.
  4. How to simplify code with apply for row-wise or column-wise operations.

This experience helped me understand debugging better and taught me some useful R tricks for handling data more efficiently.

For a more detailed exploration of the data manipulation techniques and code examples, you can find everything in my GitHub repository.

Comments

Popular posts from this blog

DNA Sequence Alignment and Visualization with "SequenceAlignment" Package

 DNA Sequence Alignment and Visualization with "SequenceAlignment" Package In bioinformatics, sequence alignment plays a crucial role in comparing biological sequences, especially DNA sequences. It helps in identifying similarities, differences, and evolutionary relationships between sequences. In this blog, we’ll explore how to use the SequenceAlignment R package for performing sequence alignments, visualizing the results with plots like barplots and heatmaps , and analyzing DNA sequences against multiple reference sequences stored in FASTA files. What is Sequence Alignment? Sequence alignment is the process of comparing two or more biological sequences (e.g., DNA, RNA, or proteins) to identify regions of similarity or difference. In DNA sequence alignment, the sequences are compared to see how closely they match, which can provide insights into genetic similarities, mutations, or evolutionary trends. The SequenceAlignment Package The SequenceAlignment package is a powerf...

Week 8 : Tackling Data Handling Challenges and Finding Solutions

 Week 8 : Tackling Data Handling Challenges and Finding Solutions This time i had the opportunity to dive deeper into R by using the plyr package to compute the mean of grades split by gender and export the results to a file. The task seemed straightforward: import a dataset, perform some basic operations, and output the result. However, as with most programming journeys, I encountered a few hurdles along the way, leading to a wealth of learning. Step 1: Importing the Dataset The first task was to import a dataset into R. I used the read.table() function, which reads the file in a tabular format. Initially, the command worked well, but I did face a minor challenge when choosing the right separator for the CSV file ( sep="," ). This was an easy fix once I realized the file used commas to separate values. Here's the command that worked: Lesson learned: Always double-check the file format and ensure the separator used in the file is correctly specified. Step 2: Calculatin...

Journey Through R Programming: Week 1

  Journey Through R Programming: Week 1 Introduction Welcome to my blog! As part of my Open Source R course with Professor Alon Friedman at the University of South Florida, I’m excited to document my weekly progress in learning R programming. A bit about me: I’m currently pursuing a Master’s in Bioinformatics & Computational Biology, following an undergrad in Biotechnology. My programming journey began with Python through the “100 Days of Code: The Complete Python Pro Bootcamp” on Udemy, which included around 8 mini projects. This experience has made transitioning to R a bit smoother, as many concepts overlap. To support my learning, I’m using the book  The Art of R Programming  and the edX course  Data Science: R Basics  from Harvard University. These resources have been invaluable in deepening my understanding of R. Summary 1. Function Creation Objective: Create a function to count the number of odd numbers in a vector. Code: What I Learned: The modulus op...