Week 9 : Exploring Cancer Survival Data Visualization in R
barplot()
, ggplot2
, and an xyplot()
with lattice
. Here’s a breakdown of the journey, the challenges faced, and what I learned along the way.The Data: Mean Survival Time by Organ
The dataset I worked with contains information on the survival times across different organs from cancer . To understand the average survival time for each organ, I first calculated the mean survival time by using the following code:
Once I had the mean survival times, I set out to visualize the data using four different approaches, each with its unique set of functionalities and aesthetics.
1. Basic Bar Plot with Base R
My first plot used a simple barplot()
to display the mean survival times. This method provided a quick and straightforward way to visualize the data, though it came with limitations in terms of customizability and aesthetics.
barplot()
. However, limitations appeared when I tried to adjust the spacing and aesthetics further. Base R plots are best for quick, simple visualizations, but they lack the advanced customization options available in more specialized packages.2. Using ggplot2 for Advanced Customization
ggplot2
allowed me to go beyond basic customization and enhance the plot’s aesthetics and readability. With ggplot2
, I created a plot with more vibrant colors and improved font control.
ggplot2
had a steeper learning curve, but its flexibility was worth it. It allowed me to control almost every aspect of the plot’s aesthetics and positioning, and the use of layers in ggplot2
gave me a powerful way to add or modify elements independently.4. Scatter Plot with xyplot in Lattice
Finally, I experimented with xyplot()
in lattice
, creating a scatter plot that displayed individual survival times across organs. This plot format showed data distributions more effectively than the bar plots, adding another dimension to my visualization.
Challenges & Learnings: Before plotting, I converted the Organ
variable to a factor using as.factor()
. This conversion was crucial because it ensured that R treated Organ
as a categorical variable rather than a numeric one, affecting how the data points were displayed on the x-axis. Configuring colors for each unique organ using xyplot()
was challenging but rewarding. This plot style also added more context to the data, showing the variability of survival times within each organ, unlike the bar charts that only represented averages.
Comparison
Each plot provided unique insights:
- Bar Plot (Base R): A basic representation, useful for a quick view but limited in customization.
- ggplot2 Bar Chart: Provided advanced customization and a polished, publication-ready look.
- xyplot Scatter Plot: Added another layer of detail by showing individual survival times, giving more context to each organ’s data.
Final Thoughts
Through this journey, I learned a lot about R’s plotting ecosystem. While base R plots are great for rapid visualizations, lattice
and ggplot2
offer progressively more control and customization options. The experience taught me that the right choice of plotting tool depends on the depth of information I wish to convey and the time I have available for design.
For a more detailed exploration of the data manipulation techniques and code examples, you can find everything in my GitHub repository.
Comments
Post a Comment