Week: 11 - Debugging Journey: Fixing the tukey_multiple Function

Week: 11 Debugging Journey- Fixing the tukey_multiple Function

Introduction

In this post, I’ll explain how I fixed a bug in the `tukey_multiple` function in R. The goal of this function is to find rows in a dataset where all the values are outliers. Initially, I ran into errors, but I eventually found the problem and fixed it. Here’s how I did it.

Understanding the `tukey_multiple` Function

The function `tukey_multiple` is supposed to:
Check each column in a dataset to see if values are outliers.
Return a list that tells us whether each row has only outliers across all columns.

Here’s the original code I started with:

Step 1: Initial Problem - Missing Function

When I tried to run the code, I got an error saying `tukey.outlier` wasn’t found. After checking, I realized that `tukey.outlier` isn’t a built-in function in R and wasn’t defined anywhere else in the code. This function is supposed to check if values in a column are outliers, so I knew I needed to write my own function to do that.

What I Learned

Check if functions exist before using them, especially if they’re not from R’s built-in functions or packages.

Step 2: Creating a Helper Function for Outliers

To fix the missing function, I created my own `tukey_outlier` function. This function detects outliers based on Tukey’s Rule, which states that:
Values below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR are outliers.
Here, Q1 is the first quartile (25th percentile), Q3 is the third quartile (75th percentile), and IQR is the interquartile range (Q3 - Q1).

Here’s my new helper function:

What I Learned

Understanding Tukey’s Rule helped me create a simple function to detect outliers without needing a special library.

Step 3: Fixing the Logical Error

Once I defined `tukey_outlier`, I ran into a new issue with this line of code:

The `&&` operator didn’t work correctly here because `&&` is used for single logical comparisons in R, not for checking every element in an array. So, I replaced `&&` with a simpler check that applied `tukey_outlier` to each column directly:

What I Learned

Use `&` (element-wise comparison) instead of `&&` when working with arrays in R.

Step 4: Improving Row-Wise Check for Outliers

Lastly, I wanted to check if each row had only outliers across all columns. Originally, I used a loop to check each row
To make this part simpler, I replaced the loop with `apply(outliers, 1, all)`. This single line does the same thing, checking if every value in each row is an outlier

Final Solution

Here’s the final, corrected `tukey_mutiple` function:

Testing and Results

I tested the fixed function on the `iris` dataset, and it worked as expected. Running `tukey_multiple(iris)` gave me a vector that showed which rows had all outliers.

Conclusion

Through this debugging process, I learned:
The importance of defining all functions used in code.
How to apply Tukey’s Rule for finding outliers.
The difference between `&` and `&&` for logical operations.
How to simplify code with `apply` for row-wise or column-wise operations.
This experience helped me understand debugging better and taught me some useful R tricks for handling data more efficiently.
For a more detailed exploration of the data manipulation techniques and code examples, you can find everything in my GitHub repository.

Comments