Skip to main content

Week 9 : Exploring Cancer Survival Data Visualization in R

 Week 9 : Exploring Cancer Survival Data Visualization in R



In this Assignment, I explored ways to visualize cancer survival data across different organs using a variety of R plotting methods, including base R’s barplot(), ggplot2, and an xyplot() with lattice. Here’s a breakdown of the journey, the challenges faced, and what I learned along the way.

The Data: Mean Survival Time by Organ

The dataset I worked with contains information on the survival times across different organs from cancer . To understand the average survival time for each organ, I first calculated the mean survival time by using the following code:



Once I had the mean survival times, I set out to visualize the data using four different approaches, each with its unique set of functionalities and aesthetics.


1. Basic Bar Plot with Base R

My first plot used a simple barplot() to display the mean survival times. This method provided a quick and straightforward way to visualize the data, though it came with limitations in terms of customizability and aesthetics.


Challenges & Learnings:
 Customizing the title color, font, and axis labels was fairly easy in barplot(). However, limitations appeared when I tried to adjust the spacing and aesthetics further. Base R plots are best for quick, simple visualizations, but they lack the advanced customization options available in more specialized packages.


2. Using ggplot2 for Advanced Customization

ggplot2 allowed me to go beyond basic customization and enhance the plot’s aesthetics and readability. With ggplot2, I created a plot with more vibrant colors and improved font control.








Challenges & Learnings: ggplot2 had a steeper learning curve, but its flexibility was worth it. It allowed me to control almost every aspect of the plot’s aesthetics and positioning, and the use of layers in ggplot2 gave me a powerful way to add or modify elements independently.



4. Scatter Plot with xyplot in Lattice

Finally, I experimented with xyplot() in lattice, creating a scatter plot that displayed individual survival times across organs. This plot format showed data distributions more effectively than the bar plots, adding another dimension to my visualization.




Challenges & Learnings: Before plotting, I converted the Organ variable to a factor using as.factor(). This conversion was crucial because it ensured that R treated Organ as a categorical variable rather than a numeric one, affecting how the data points were displayed on the x-axis. Configuring colors for each unique organ using xyplot() was challenging but rewarding. This plot style also added more context to the data, showing the variability of survival times within each organ, unlike the bar charts that only represented averages.



Comparison

Each plot provided unique insights:

  1. Bar Plot (Base R): A basic representation, useful for a quick view but limited in customization.
  2. ggplot2 Bar Chart: Provided advanced customization and a polished, publication-ready look.
  3. xyplot Scatter Plot: Added another layer of detail by showing individual survival times, giving more context to each organ’s data.

Final Thoughts

Through this journey, I learned a lot about R’s plotting ecosystem. While base R plots are great for rapid visualizations, lattice and ggplot2 offer progressively more control and customization options. The experience taught me that the right choice of plotting tool depends on the depth of information I wish to convey and the time I have available for design.


For a more detailed exploration of the data manipulation techniques and code examples, you can find everything in my GitHub repository.

Comments

Popular posts from this blog

DNA Sequence Alignment and Visualization with "SequenceAlignment" Package

 DNA Sequence Alignment and Visualization with "SequenceAlignment" Package In bioinformatics, sequence alignment plays a crucial role in comparing biological sequences, especially DNA sequences. It helps in identifying similarities, differences, and evolutionary relationships between sequences. In this blog, we’ll explore how to use the SequenceAlignment R package for performing sequence alignments, visualizing the results with plots like barplots and heatmaps , and analyzing DNA sequences against multiple reference sequences stored in FASTA files. What is Sequence Alignment? Sequence alignment is the process of comparing two or more biological sequences (e.g., DNA, RNA, or proteins) to identify regions of similarity or difference. In DNA sequence alignment, the sequences are compared to see how closely they match, which can provide insights into genetic similarities, mutations, or evolutionary trends. The SequenceAlignment Package The SequenceAlignment package is a powerf...

Week 8 : Tackling Data Handling Challenges and Finding Solutions

 Week 8 : Tackling Data Handling Challenges and Finding Solutions This time i had the opportunity to dive deeper into R by using the plyr package to compute the mean of grades split by gender and export the results to a file. The task seemed straightforward: import a dataset, perform some basic operations, and output the result. However, as with most programming journeys, I encountered a few hurdles along the way, leading to a wealth of learning. Step 1: Importing the Dataset The first task was to import a dataset into R. I used the read.table() function, which reads the file in a tabular format. Initially, the command worked well, but I did face a minor challenge when choosing the right separator for the CSV file ( sep="," ). This was an easy fix once I realized the file used commas to separate values. Here's the command that worked: Lesson learned: Always double-check the file format and ensure the separator used in the file is correctly specified. Step 2: Calculatin...

Journey Through R Programming: Week 1

  Journey Through R Programming: Week 1 Introduction Welcome to my blog! As part of my Open Source R course with Professor Alon Friedman at the University of South Florida, I’m excited to document my weekly progress in learning R programming. A bit about me: I’m currently pursuing a Master’s in Bioinformatics & Computational Biology, following an undergrad in Biotechnology. My programming journey began with Python through the “100 Days of Code: The Complete Python Pro Bootcamp” on Udemy, which included around 8 mini projects. This experience has made transitioning to R a bit smoother, as many concepts overlap. To support my learning, I’m using the book  The Art of R Programming  and the edX course  Data Science: R Basics  from Harvard University. These resources have been invaluable in deepening my understanding of R. Summary 1. Function Creation Objective: Create a function to count the number of odd numbers in a vector. Code: What I Learned: The modulus op...