Vectors

  1. Create a vector x using seq() with the odd numbers from 1 to 13.

Use the R function sum() to compute the sum of these numbers.

Use the R function mean() to compute the mean of these numbers.

x = seq(1, 13, by = 2)
sum(x)
## [1] 49
mean(x)
## [1] 7

  1. From x extract numbers > 10 using which() function.
x[which(x>10)]
## [1] 11 13

  1. Create a character vector z with the following feminine names: "Giulia", "Maria", "Beatrice", "Roberta", "Barbara".

Then use both grep() and grepl() functions to identify which names contain the letter b.

Which is the difference between their outputs?

Finally, extract from z the names that contain the uppercase letter "B".

z = c("Giulia", "Maria", "Beatrice", "Roberta", "Barbara")
grep("b",z)
## [1] 4 5
grepl("b",z)
## [1] FALSE FALSE FALSE  TRUE  TRUE
z[grepl("B",z)]
## [1] "Beatrice" "Barbara"

  1. Create a vector w containing the names of students in a school class: "Giovanni", "Mario", "Maria", "Luca", "Matteo", "Nadia".

Check if names in vector z (from the previous exercise) are present in w vector.

If there are any matches, extract the indices of those names in w.

w = c("Giovanni", "Mario", "Maria", "Luca", "Matteo", "Nadia")
w %in% z
## [1] FALSE FALSE  TRUE FALSE FALSE FALSE
which(w %in% z)
## [1] 3

  1. Add attributes to the vector w. In particular, add surnames using names() function and then sex and age as attributes.

Then, extract the attribute age and assign it to a new vector a.

Display all attributes of w.

names(w) = c("Rossi", "Verdi", "Boschi", "Prati", "Comba", "Sottile")
attr(w, "sex") = c("M", "M", "F", "M","M", "F")
attr(w, "age") = c(6, 7, 5, 6, 6, 7)
  
a = attr(w, "age")

attributes(w)
## $names
## [1] "Rossi"   "Verdi"   "Boschi"  "Prati"   "Comba"   "Sottile"
## 
## $sex
## [1] "M" "M" "F" "M" "M" "F"
## 
## $age
## [1] 6 7 5 6 6 7

  1. ==,>,<, >=, <= operators create logical vectors. Check the results of the following operations: myvec=1:5, myvec > 3, myvec == 4, myvec <= 2, myvec != 4
myvec = 1:5
myvec > 3
## [1] FALSE FALSE FALSE  TRUE  TRUE
myvec == 4
## [1] FALSE FALSE FALSE  TRUE FALSE
myvec <= 2
## [1]  TRUE  TRUE FALSE FALSE FALSE
myvec != 4
## [1]  TRUE  TRUE  TRUE FALSE  TRUE

  1. Create a new vector that contains 1, TRUE, FALSE, 25 and 4. Explain what happens.
vector = c(1, TRUE, FALSE, 25, 4)
vector
## [1]  1  1  0 25  4

  1. Create a new vector that contains 1, 3, "a", "b" and "c". Explain what happens.
vector = c(1, 3, "a", "b", "c")
vector
## [1] "1" "3" "a" "b" "c"

  1. Create a new vector that contains 1, 3, "a", "b", "c" and FALSE. Explain what happens.
vector = c(1, 3, "a", "b", "c", FALSE)
vector
## [1] "1"     "3"     "a"     "b"     "c"     "FALSE"

  1. Convert the vector w into a factor.

Define a specific order (choose the order yourself), then order the names in w based on the assigned levels.

w = factor(w, levels = c("Nadia", "Maria", "Luca", "Giovanni", "Mario", "Matteo"))
w[order(w)]
##  Sottile   Boschi    Prati    Rossi    Verdi    Comba 
##    Nadia    Maria     Luca Giovanni    Mario   Matteo 
## Levels: Nadia Maria Luca Giovanni Mario Matteo

  1. Random numbers: Random numbers with specific distributions (e.g. uniform or normal) are essential in statistical data analysis.

Look up the description of R functions runif(), rnorm(), mean(), and sd() using the help documentation in R Studio.

The R function runif(10, min=0, max=1) generates 10 random numbers uniformly distributed between 0 and 1. Generate 1000 random numbers uniformly distributed between 0 and 3 and save them in a vector. Compute the mean of these numbers. Is the computed mean close to the expected theoretical mean of a uniform distribution over [0,3]?

The R function rnorm(10, mean=0,sd=1) generates 10 random numbers with mean = 0 and standard deviation = 1. Generate 1000 random numbers with mean = 150, sd = 9 and save them in a vector. Compute the mean and the standard deviation of these numbers. Compare the computed values with the expected theoretical values. Are they reasonably close?

x = runif(1000, min = 0, max = 3)
mean(x)
## [1] 1.486472
y = rnorm(1000, mean = 150, sd = 9)
mean(y)
## [1] 150.0853
sd(y)
## [1] 8.910699

  1. Generate 100 random numbers uniformly distributed between -80 and 120 and assign them to a vector nums.

Then, create a new logical vector indicating the numbers in nums satisfy at least one of the following conditions:

  • The absolute value is greater than 75 OR
  • The value is between 0 and 13
nums = runif(100, min = -80, max = 120)
nums_l = abs(nums)>75 | (nums > 0 & nums < 13)

nums
##   [1] -46.1153015 -59.2909253  -8.5534990 -47.9240989  46.2964176  60.9951218
##   [7] -22.8046999  35.4371912 -31.0305863   1.0048546 -25.3997105   0.5207665
##  [13] -23.3097322  60.6667385 104.7975734 114.9834824 -67.5412625  -4.5223283
##  [19] -62.8589926  -1.9582793 -68.6810493 -48.4418181  16.0722561 -33.2071443
##  [25]  90.1232893   2.8443955 -22.8403042 -46.3035518 -39.6746126  51.5426209
##  [31] 111.1396999 -70.9858922  25.4747805  -0.5470782  68.5852261  46.7273952
##  [37]  66.4146297 -23.4201649 -25.2018464 -11.5547051 108.6171707 -71.6402269
##  [43]  23.8103567 -44.8586126 -30.8675958  93.2024920 -46.2000846 103.3158337
##  [49] -37.1686622 101.2122630 -65.4970163  66.9912445  58.9211634 -64.7784680
##  [55] -74.5963060 -31.0843203  24.4805201  67.2006397 -69.4784654   6.6983800
##  [61]  97.9955484 -33.4186787 -70.9237373 -39.6366297 -49.2564709  50.7974973
##  [67]  -2.4833354 -10.8408773  10.7783984   6.8032658  14.3633629  17.2616962
##  [73]  54.8587621 -62.2336547 -38.9681543 115.2630816 -60.1242810 113.8202926
##  [79]   3.5693333 -10.9416838 117.3799537  37.6558318  -0.3865118 113.3143668
##  [85]  56.6956322  39.8203389  63.2655843   1.5152834  99.8682332 -12.0142423
##  [91]  53.3058910 -59.8223465 108.8681367 -49.9566196  82.2829523 -50.7047817
##  [97]  37.5346924  -0.3113664 -13.7231802 119.1626869
nums_l
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
##  [13] FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25]  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
##  [49] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [61]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
##  [73] FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE
##  [85] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
##  [97] FALSE FALSE FALSE  TRUE

  1. Create a vector t that containing the first 10 multiples of 3.

Assign the following names to t: "Antonio", "Barbara", "Carlo", "Davide", "Enrico", "Fabio", "Giovanna", "Holly", "Irene", "Luca".

Create another vector sp that contains the repetition of c("A", "B") five times.

Assign the followinig names to sp: "Carlo","Luca","Barbara","Enrico", "Fabio", "Giovanna", "Antonio","Holly", "Irene","Davide".

Use the match() function to assign an attribute Number to sp with numbers from t corresponding to the correct person.

Add another attribute Explanation to sp with the text "This is a race"

t = seq(3,30, 3)
names(t) = c("Antonio", "Barbara", "Carlo", "Davide", "Enrico", "Fabio", "Giovanna", "Holly", "Irene", "Luca")

sp = rep(c("A", "B"), 5)
names(sp) = c("Carlo","Luca","Barbara","Enrico", "Fabio", "Giovanna", "Antonio","Holly", "Irene","Davide")


attr(sp, "Number") = t[match(names(sp), names(t))]

attr(sp, "Explanation") = "This is a race"

sp
##    Carlo     Luca  Barbara   Enrico    Fabio Giovanna  Antonio    Holly 
##      "A"      "B"      "A"      "B"      "A"      "B"      "A"      "B" 
##    Irene   Davide 
##      "A"      "B" 
## attr(,"Number")
##    Carlo     Luca  Barbara   Enrico    Fabio Giovanna  Antonio    Holly 
##        9       30        6       15       18       21        3       24 
##    Irene   Davide 
##       27       12 
## attr(,"Explanation")
## [1] "This is a race"

  1. Create a vector plant containing the words "leaf", "root", "brown", "green".

Create another vector colors containing: “red”, “purple”, the repetition of “green” 3 times, the repetition 8 times of “orange” and “darkbrown”.

Create a new vector matches that contains only the elements in colors that are also present in plant.

Compute and print the length of matches.

plant = c("leaf", "root", "brown", "green")
colors = c("red", "purple", rep("green", 3), rep("orange", 8),"darkbrown")


matches = colors[colors %in% plant]
length(matches)
## [1] 3