2025-03-10

R base graphics

R allows users to create plots without the need to load any additional packages, using only the built-in functions.

Visualizing data is an essential step in data analysis, especially in biology, as it helps to better understand the distribution, relationships, and potential outliers in the data before performing more complex statistical analysis.

The main types of plots we can create are:

Histograms: using the hist() function, useful for visualizing the distribution of a continuous variable
Density plots: with plot(density()), to estimate the probability distribution of a continuous variable
Scatterplots: using plot(), ideal for exploring the relationship between two continuous variables
Boxplots: with boxplot(), to compare the distributions of a variable across different groups
Barplots: using barplot(), commonly used for categorical data or to compare the sizes of different groups

Histogram

A histogram is useful for visualizing the distribution of a continuous variable.

Let’s generate 100 random values from a normal distribution and plot them as a histogram.

x <- rnorm(100)
hist(x)

We can customize the histogram by adding arguments to the function.

For example, we can:

Change the fill color (col) and border color (border) of the bars
Add a title (main)
Adjust the number of bins (breaks)

hist(x, col = "pink", border = "blue", main = "My first histogram", breaks = 100)

By increasing the number of breaks, we get a more detailed view of the distribution.

However, too many breaks may make the histogram less readable.

Density Plot

A density plot is another way to visualize the distribution of a continuous variable.

Unlike histograms, density plots estimate the probability density function of the data, providing a smooth curve instead of discrete bars.

Using the same dataset, we can create a density plot with:

x <- rnorm(100)  
plot(density(x))

We can further customize the plot by:

Filling the area under the curve with color
Changing the border color

To do this, we first store the density object and then use the polygon() function to add the shaded area:

d <- density(x)
plot(d)
polygon(d, col = "pink", border = "blue")

The polygon() function fills the area under the curve, making the visualization more appealing and easier to interpret.

Scatter plot

Scatter plots are widely used in bioinformatics as they help visualize the relationship between two variables.

We can create a second vector with 100 values from the normal distribution (just like x) and plot them in a scatter plot:

x <- rnorm(100)  
y <- rnorm(100) 
plot(x, y)

We can customize the plot by:

Adding a title (main)
Changing the axis labels (xlab, ylab)

plot(x, y, main = "My first scatterplot", xlab = "X", ylab = "Y")

We can also modify the point shape (pch):

plot(x, y, main = "My first scatterplot", xlab = "X", ylab = "Y", pch = 16, col = "pink")

Some shapes allow additional customization, such as:

Changing the fill color (bg)
Changing the border color (col)

plot(x, y, main = "My first scatterplot", xlab = "X", ylab = "Y", pch = 24, col = "blue", bg = "pink")

The pch parameter controls the point shape. Some values (like 24, a filled triangle) allow specifying both a border color (col) and a fill color (bg).

Plotting symbols

Boxplot

A boxplot is useful for visualizing the distribution, variability, and potential outliers of a dataset.

Using the same vectors we used for the scatter plot, we can create a boxplot to compare their distributions.

We can also use a vector of colors to distinguish the two variables:

boxplot(x, y, main = "My first boxplot", xlab = "Vector name", ylab = "Values", col = c("darkslategray1", "darkgoldenrod1"), names = c("X", "Y"))

The central box represents the interquartile range (IQR), containing the middle 50% of the data
The horizontal line inside the box is the median
The whiskers extend to the smallest and largest values within 1.5 times the IQR from the quartiles.
Points outside the whiskers are considered outliers.

By modifying parameters such as col (box color), names (axis labels), and main (title), we can improve readability and presentation.

We can also change the orientation and add notches to the boxplots:

boxplot(x, y, main = "Boxplot with notches", ylab = "Vector name", xlab = "Values", col = c("darkslategray1", "darkgoldenrod1"), horizontal = TRUE, notch = TRUE, names = c("X", "Y"))

Notches in a boxplot represent a confidence interval around the median.

They are useful when comparing two or more distributions because:

If the notches of two boxplots do not overlap: this suggests a statistically significant difference between the medians
If the notches overlap: there is no strong evidence that the medians are different.

Using notches can help in biological data analysis, where comparing distributions (e.g., gene expression levels, experimental results) is common. What are notches?

In summary

Barplot

A barplot is useful for visualizing categorical data. For example, if we have a dataframe containing information about eye color frequency, we can represent it as a barplot.

Let’s define a small dataset with eye colors, their respective percentages, rarity, and a color scheme:

options(stringsAsFactors = F)
data = cbind.data.frame(eyes_color = c("brown", "green", "grey", "blue"), 
                        percentage = c(80, 10, 3, 7),
                        rarity = c("not rare", "rare", "rare", "rare"), 
                        color = c("magenta", "yellow", "yellow", "yellow"))

data

##   eyes_color percentage   rarity   color
## 1      brown         80 not rare magenta
## 2      green         10     rare  yellow
## 3       grey          3     rare  yellow
## 4       blue          7     rare  yellow

We can plot these values using a barplot, assigning colors to each bar based on the dataset.

To improve readability, we can add a legend to indicate the rarity of each eye color:

barplot(height = data$percentage, names.arg = data$eyes_color, xlab = "Eyes colour", ylab = "Percentage", col = data$color)
legend("topright", legend = c("not rare","rare"), fill = c("magenta","yellow"))

If the category names on the x-axis are long or overlap, we can rotate them for better visibility using las = 2:

barplot(height = data$percentage, names.arg=data$eyes_color, xlab = "Eyes colour", ylab = "Percentage", col=data$color, las=2)

Overlay plots

R allows us to add multiple layers to a plot, combining different plot types.

This is useful for adding annotations, lines, or shapes to enhance data visualization.

Let’s start with a scatter plot and add:
- A margin text (mtext()) on the right side of the plot.
- A text inside the plot (text()) showing the correlation between x and y.

plot(x, y, main = "Scatterplot", xlab = "X",ylab = "Y", pch = 24, col = "blue", bg = "pink")
mtext(text = "This is a plot", side = 4)
cor_value = cor(x,y)
text(1, 2, paste("The correlation is", round(cor_value, 2)))

We can also add lines, polygons, and other elements to enrich our visualization. For example, adding a regression line to the scatter plot:

plot(x, y, main = "Scatterplot", xlab = "X",ylab = "Y", pch = 24, col = "blue", bg = "pink")
abline(lm(y ~ x), col = "red", lwd = 2)

Other elements that can be added include:

Horizontal and vertical lines: abline(h=...), abline(v=...)
Custom shapes: polygon(), rect(), etc

Overlaying elements can be useful in biological data analysis, such as highlighting trends in gene expression or experimental measurements.

Combining multiple plots

R allows us to combine multiple plots in the same graphic using the par() function.

The argument mfrow = c(nrows, ncols) defines the layout of the plotting area:
- nrows → Number of rows
- ncols → Number of columns

In the following example, we split the plotting area into two columns (mfrow = c(1, 2)) to display:
1. A density plot with a filled polygon
2. A scatter plot

par(mfrow = c(1, 2))
plot(d)
polygon(d, col = "pink", border = "blue")
plot(x,y, main = "Scatterplot", xlab = "X", ylab = "Y", pch = 24, col = "blue", bg = "pink")

To stack the plots vertically, we use mfrow = c(2, 1), meaning:

2 rows
1 column

par(mfrow = c(2, 1))
plot(d)
polygon(d, col = "pink", border = "blue")
plot(x,y, main = "Scatterplot", xlab = "X", ylab = "Y", pch = 24, col = "blue", bg = "pink")

Key points:

par(mfrow = ...) must be called before plotting

mfrow = c(1, 2): Plots side by side (1 row, 2 columns)
mfrow = c(2, 1): Plots stacked vertically (2 rows, 1 column)

This feature is useful for comparing different visualizations side by side, such as experimental data distributions in biological research.

Save a plot

In R, we can save a plot in different formats, with the most common being:

PDF (recommended for high-quality outputs)
PNG (for raster images)
JPEG (for compressed images)

To save a plot, follow these steps:

Open a graphics device using one of the functions: pdf(), png(), or jpeg().
Create the plot (this will be saved to the device).
Close the graphics device using dev.off(), which finalizes the plot and saves it.

In the following example, we will save a plot with two panels (a density plot and a scatter plot) as a PDF:

pdf("name.pdf", width = 7, height = 3)
par(mfrow = c(1, 2))
plot(d, main = "Density Plot")
polygon(d, col = "pink", border = "blue")
plot(x, y, main = "Scatterplot", xlab = "X", ylab = "Y", pch = 24, col = "blue", bg = "pink")
dev.off()

Key points

Graphics device functions (pdf(), png(), jpeg()) determine the file format and settings.
Width and height can be adjusted to fit the desired plot dimensions.
dev.off() is essential to save and close the plot file.

Colors

Specify color names

The easiest way to specify a color is to enter its name as a string.

R contains a wide variety of color names and shades. You can view all the available color names by typing colors() in R.

Additionally, you can find several color guides online, such as http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf.

Let’s create a vector with all the color names and select 16 random colors to visualize:

all_colors <- colors()

set.seed(18) # Let's fix a seed to always obtain the same selection

some_colors <- sample(all_colors, 16)

library(scales)

show_col(some_colors)

Specify colors by hexadecimal code

You can also specify colors using their hexadecimal code, for example:

hex <- c("#CC0066", "#9933CC", "#3399FF", "#00FF00")

pie(rep(1, length(hex)), col = hex, labels = hex)

Create a vector of n contiguous colors

To generate a vector of n contiguous colors, use functions like:

rainbow(n)
heat.colors(n)
terrain.colors(n)
topo.colors(n)
cm.colors(n)

These functions are useful for creating color gradients.

Palettes

Manually built continuous palettes

You can create custom continuous color palettes by mixing two or more colors.

The functions colorRamp() and colorRampPalette() handle this “mixing” process. You can either use the full palette or extract a specific number of colors.

Here is an example where we create a continuous palette from three colors and display it:

pal <- colorRampPalette(c("magenta", "cyan", "yellow"))
pal(3) # get 3 colors from the created palette

## [1] "#FF00FF" "#00FFFF" "#FFFF00"

pie(rep(1, 3), col = pal(3) , labels = pal(3), clockwise = TRUE)

pal(20) # see how many shades we get

##  [1] "#FF00FF" "#E41AFF" "#C935FF" "#AE50FF" "#936BFF" "#7886FF" "#5DA1FF"
##  [8] "#43BBFF" "#28D6FF" "#0DF1FF" "#0DFFF1" "#28FFD6" "#43FFBB" "#5DFFA1"
## [15] "#78FF86" "#93FF6B" "#AEFF50" "#C9FF35" "#E4FF1A" "#FFFF00"

pie(rep(1, 20), col = pal(20) , labels = pal(20), clockwise = TRUE)

RColorBrewer palettes

RColorBrewer offers three main types of color palettes:

Sequential: Best for ordered data that progresses from low to high or vice versa.
Qualitative: Suited for categorical data where color differences do not imply magnitude.
Diverging: Emphasizes extremes at both ends of the data range.

To see all available palettes, use the following code:

library("RColorBrewer")
display.brewer.all()

To visualize a single palette, specify the number of colors you need:

display.brewer.pal(n = 9, name = 'PiYG')

display.brewer.pal(n = 5, name = 'PiYG')

You can also extract the hexadecimal color codes from a palette:

brewer.pal(n = 9, name = 'PiYG')

## [1] "#C51B7D" "#DE77AE" "#F1B6DA" "#FDE0EF" "#F7F7F7" "#E6F5D0" "#B8E186"
## [8] "#7FBC41" "#4D9221"

brewer.pal(n = 5, name = 'PiYG')

## [1] "#D01C8B" "#F1B6DA" "#F7F7F7" "#B8E186" "#4DAC26"

To use an RColorBrewer palette in a plot:

barplot(c(2,5,7), col = brewer.pal(n = 3, name = "PiYG"))

Viridis palettes

The viridis package offers a range of perceptually uniform color scales that are both colorblind-friendly and printable in grayscale.

In particular, viridis creators define their palettes as:

Colorful: covering as wide a range as possible to make differences easy to distinguish
Perceptually uniform: values close to each other have similar-appearing colors, while values that are far apart appear distinctly different, consistently across the entire range
Colorblind-friendly: ensuring that these properties hold true for people with common forms of colorblindness, as well as in grayscale printing
Aesthetically pleasing

Some of the available scale are: “viridis”, “magma”, “plasma”, “inferno”, “cividis”, “mako”, “rocket”, and “turbo”.

At each scale, a letter is assigned to select colors:

magma: A
plasma: B
inferno: C
viridis: D
cividis: E
rocket: F
mako: G
turbo: H

Here is how to use a viridis palette in a plot:

library(viridis)

myCol <- viridis(n = 4, option = "D") 

pie(rep(1, 4), col = myCol, labels = myCol)

Wesanderson palettes

The wesanderson package provides color palettes inspired by the films of Wes Anderson.

Here is an example:

library(wesanderson)
names(wes_palettes)

##  [1] "BottleRocket1"     "BottleRocket2"     "Rushmore1"        
##  [4] "Rushmore"          "Royal1"            "Royal2"           
##  [7] "Zissou1"           "Zissou1Continuous" "Darjeeling1"      
## [10] "Darjeeling2"       "Chevalier1"        "FantasticFox1"    
## [13] "Moonrise1"         "Moonrise2"         "Moonrise3"        
## [16] "Cavalcanti1"       "GrandBudapest1"    "GrandBudapest2"   
## [19] "IsleofDogs1"       "IsleofDogs2"       "FrenchDispatch"   
## [22] "AsteroidCity1"     "AsteroidCity2"     "AsteroidCity3"

wes_palette("Royal2")

You can use these palettes in plots as follows:

barplot(c(2,5,7), col = wes_palette(n = 3, name = "Royal2"))

ggsci palettes

The ggsci package offers a wide range of color palettes designed specifically for use with ggplot2 (you can find more information here).

One important feature of ggsci is the concept of palette families. A palette family consists of a set of related color schemes, where each family contains several variations of the same basic color theme.

For example, a family might include both light and dark versions of the same color palette, or different tones of the same color set. This allows for flexibility in choosing colors that maintain a consistent visual theme across different types of plots.

In addition, ggsci provides two types of palettes:

Discrete palettes: used when you have categorical (distinct) data and need a set of colors to represent each category.
Continuous palettes: used for data that has a range (e.g., numerical data) and need a gradient of colors to represent values.

For example:

library(ggsci)
barplot(c(2,5,7), col = pal_simpsons("springfield")(3))  # Discrete palette from Simpsons

barplot(c(2,5,7), col = pal_material("indigo")(3)) # continous palette from Material

barplot(c(2,5,7), col = pal_material("light-green")(3)) # continous palette from Material

Others color palettes

Many other color palettes are available, such as those found in the MetBrewer package (based on famous paintings and sculptures housed at the Metropolitan Museum of Art) and Van Gogh’s palettes.

You can explore them for more creative color choices:

MetBrewer: MetBrewer GitHub repository
Van Gogh palettes: Van Gogh R package vignette

Margins

In base R plotting, you can control two margin areas:

Plot margin: This refers to the inner space surrounding the plot area.
Outer margin: This is the space outside the plot but within the figure window

You can modify these areas using the par() function with the appropriate arguments:

mar: Defines the margins for the plot area (inner space).
oma: Defines the outer margins (outside the plot area).

Both arguments require four values, which represent the space (in lines) for the bottom, left, top, and right sides of the plot, respectively.

For example, par(mar=c(4,0,0,0)) sets a margin of size 4 only on the bottom of the plot.

Alternatively, you can define margins in inches using:

mai()
omi()

Here’s an example of how to use these options in practice:

#  set the outer margin (all sides) to 3 lines of space
par(oma=c(3,3,3,3)) 
#  set the inner plot margin to specific values
par(mar=c(5,4,4,2) + 0.1)

# plot a basic empty plot (no points: type="n" hides the points)
plot(0:10, 0:10, type = "n", xlab = "X", ylab = "Y") 

# add text to the plot area, coloring it red
text(5,5, "Plot", col = "red", cex = 2)
box(col = "red")

# add labels in the margins and color them forestgreen
mtext("Margins", side = 3, line = 2, cex = 2, col = "forestgreen")  
mtext("par(mar=c(b,l,t,r))", side = 3, line = 1, cex = 1, col = "forestgreen")  
mtext("Line 0", side = 3, line = 0, adj = 1.0, cex = 1, col = "forestgreen")  
mtext("Line 1", side = 3, line = 1, adj = 1.0, cex = 1, col = "forestgreen")  
mtext("Line 2", side = 3, line = 2, adj = 1.0, cex = 1, col = "forestgreen")  
mtext("Line 3", side = 3, line = 3, adj = 1.0, cex = 1, col = "forestgreen")  
box("figure", col = "forestgreen")    
 
# add labels to the outer margin area and color it blue
# 'outer=TRUE' moves us from the figure margins to the outer margins
mtext("Outer Margin Area", side = 1, line = 1, cex = 2, col = "blue", outer = TRUE)  
mtext("par(oma=c(b,l,t,r))", side = 1, line = 2, cex = 1, col = "blue", outer = TRUE)  
mtext("Line 0", side = 1, line = 0, adj = 0.0, cex = 1, col = "blue", outer = TRUE)  
mtext("Line 1", side = 1, line = 1, adj = 0.0, cex = 1, col = "blue", outer = TRUE)  
mtext("Line 2", side = 1, line = 2, adj = 0.0, cex = 1, col = "blue", outer = TRUE)  
box("outer", col = "blue")

R basics plot