ggplot2

  1. Create a dataframe age_weight from vectors: Age=c(0,1,2,3,4,5,6,7,8,9) and Weigth=c(3.6,4.4,5.2,6,6.6,7.2,7.8,8.4,8.8,9.2)
age_weight <- data.frame( 'Age' = c(seq(0,9)),
                          'Weight' = c(3.6,4.4,5.2,6,6.6,7.2,7.8,8.4,8.8,9.2))
  1. Draw a scatterplot (using geom_point) of the Age vs Weight. Hint: when defining your aesthetics the Age will be the x and Weight will be the y.
library(ggplot2)
ggplot(age_weight, aes(x=Age, y=Weight)) +
  geom_point()

  1. Color border of all points with the same color by fixing it outside the aesthetic and fix the size of point at 3
ggplot(age_weight, aes(x=Age, y=Weight)) +
  geom_point(size=3, colour="blue2")

  1. You can notice that a relationship exists between the two variables. Change the geometry to geom_line to see another way to represent this plot.
ggplot(age_weight, aes(x=Age, y=Weight)) +
  geom_line()

  1. Combine the two plots by adding both a geom_line and a geom_point geometry to show both the individual points and the overall trend. Add a title to the plot.
ggplot(age_weight, aes(x=Age, y=Weight)) +
  geom_line()+
  geom_point(size=3, colour="blue2") +
  ggtitle('Relationship between Age and Weight')

  1. Load the iris dataset from R (included into ggplot2 package) and inspect the relationship between the sepal length and the sepal width. Which kind of plot you can use? Make it! If the type of plot you chose allows that, try changing colors, shapes and sizes.
library(ggplot2)
iris <- iris 
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
ggplot(iris, aes(x= Sepal.Length, y=Sepal.Width))+geom_point()

ggplot(iris, aes(x= Sepal.Length, y=Sepal.Width))+geom_point(col="firebrick",shape=2,size=4)

  1. If you haven’t already done it, rename axes labels and add a proper title.
library(ggplot2)
iris <- iris 
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
ggplot(iris, aes(x= Sepal.Length, y=Sepal.Width))+geom_point()+
  xlab("Sepal length")+ylab("Sepal width")+ggtitle("")

  1. Explore the distribution of petal width across different species of iris. Try:
    • making a boxplot with as x the species and as y the petal width. Put notches=TRUE. Color boxes according to species.
    • making a violin plot of the same values.
    • making a jitter plot.
    • making an histogram. What do you have to fix as x? Does the histogram highlight the differences?
    • making a density plot. Remember to colour curves according to species.
    • making a summary plot.
    • making a histogram coupled with faceting.
    What type of information each plot gives to you?
library(ggplot2)
iris <- iris 
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
ggplot(iris, aes(x= Species,fill=Species, y=Petal.Width))+geom_boxplot(notch = T)
## Notch went outside hinges
## ℹ Do you want `notch = FALSE`?

ggplot(iris, aes(x= Species,fill=Species, y=Petal.Width))+geom_violin()

ggplot(iris, aes(x= Species,fill=Species, y=Petal.Width))+geom_jitter()

ggplot(iris, aes(x= Petal.Width,color=Species))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(iris, aes(x= Petal.Width,color=Species))+geom_density()

ggplot(iris, aes(x=Species,y= Petal.Width))+stat_summary()
## No summary function supplied, defaulting to `mean_se()`

ggplot(iris, aes(x= Petal.Width,color=Species))+geom_histogram()+facet_wrap(~Species)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  1. See the 2d distribution by using petal length as x, sepal length as y and applying:
    • geom_hex()
    • stat_density_2d()
    For some examples and suggestions you can always consult R graph gallery. In this case https://r-graph-gallery.com/2d-density-chart.html.
ggplot(iris, aes(x= Petal.Length, y=Sepal.Length))+geom_hex()

ggplot(iris, aes(x= Petal.Length, y=Sepal.Length))+stat_density_2d()

  1. Create a dataframe df from the following vectors:
person=c("Thomas", "Lisa", "Thomas", "Lisa", "Thomas", "Morris", "Morris", "Lisa", "Thomas", "Colin", "Colin", "Myrtha", "Colin", "Chloe", "Thomas", "Myrtha")
sport=c("yoga","yoga","tennis","crossfit","judo","football","ski","ski","weight_training","weight_training","power_lifting","pilates","nordic_walking","nordic_walking","nordic_walking","nordic_walking")

and then:

  • make a barplot of the number of sports played by each person (Notice that these are unsummarized data)
  • Using table() function make a summarized data frame df2 in which to each person is associated the number of sports he/she plays (Hint: use as.data.frame() function)
  • make a barplot with the new summarized data frame
  • order the bars in the last barplot in ascending order
  • change axes names and fill the bars according to person name
  • Flip coordinates using +coord_flip()
df=cbind.data.frame(person, sport)

ggplot(df, aes(x=person))+geom_bar()

df2=as.data.frame(table(df$person))

ggplot(df2, aes(x=reorder(Var1, Freq), y=Freq, fill=Var1))+geom_bar(stat="identity")+xlab("Name")+ylab("")+coord_flip()

  1. Add to df the column Times with the following commands
set.seed(1234) # ensure reproducibility across randomization steps
df$Times=sample(c(1,2,3,4),nrow(df),replace = TRUE) # choose a random number in the range 1:4 for each row in df

Using person as x, Times as y and sport for fill, make a stacked barplot, a dodged barplot and a percentage barplot. Use col = "black" to better highlight the different groups

ggplot(df,aes(x=person,y=Times,fill=sport))+
  geom_bar(stat = "identity",col="black")

ggplot(df,aes(x=person,y=Times,fill=sport))+
  geom_bar(stat = "identity",position="dodge",col="black")

ggplot(df,aes(x=person,y=Times,fill=sport))+
  geom_bar(stat = "identity",position="fill",col="black")

  1. After loading the iris dataset, make a scatterplot with the sepal length as x, the sepal width as y and the petal width as point size. Also:
    • fix shape=21

    • relate fill color to petal length (using fill=)

    • relate border color to iris species

    • fix a manual scale for fill colors (use +scale_fill_viridis() you have to load viridis library). Notice: there are lots of color functions adapted for ggplot2 or you can fix your own palettes using scale_fill_manual(). You will see some examples.

    • Choose a manual scale for border colors. Hint: you have to use +scale_color_manual() with three values, as the iris species are three (for example, +scale_color_manual(values=c("magenta", "orange", "cyan")) . You can also decide to associate a specific color to a specific value, like that: +scale_color_manual(values=c("virginica"="magenta","versicolor"="orange", "setosa"="cyan")) ). Make some examples to take confidence!

    • Add a fixed alpha value (transparency). Alpha accepts values between 0 and 1.

    • Apply theme_bw()

    • make a scatterplot using Sepal.Length and Petal.Length as variables. Then add the correlation line using geom_smooth and method ="lm"

    • make the same plot using facet on the variable Species (try with both facet_wrap and facet_grid)

    • which are the differences if you use scales="free_x", scales="free_y" and scales="free" inside facet_wrap() ?

library(ggplot2)
library(viridis)
## Loading required package: viridisLite
ggplot(iris, aes(x= Sepal.Length, y=Sepal.Width,size=Petal.Width , fill=Petal.Length, color=Species))+
  geom_point(shape=21, alpha=0.9)+
  scale_fill_viridis()+
  scale_color_manual(values=c("virginica"="magenta","versicolor"="orange", "setosa"="cyan"))+
  theme_bw()

ggplot(iris, aes(x= Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

ggplot(iris, aes(x= Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth(method = "lm")+
  facet_grid(~Species)
## `geom_smooth()` using formula = 'y ~ x'

ggplot(iris, aes(x= Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth(method = "lm")+
  facet_wrap(~Species)
## `geom_smooth()` using formula = 'y ~ x'

ggplot(iris, aes(x= Sepal.Length, y=Petal.Length))+
  geom_point()+
  geom_smooth(method = "lm")+
  facet_wrap(~Species,scales = "free")
## `geom_smooth()` using formula = 'y ~ x'