exam_rates
from the following
vectors:name=c("Marie", "Gianni", "Silvia", "Laura","Mariachiara", "Simone", "Sarah", "Francesca", "Matteo")
id=c(8452, "AE12", 6732, "AS54", "GF49", 9328, "DS34", 3476, 7628)
pointsTest1=c(12, 15, 14, 5, 18, 3, 10, 7, 16)
pointsTest2=c(10, 9, 10, 7, 10, 3, 9, 8, 1)
pointsTest3=c(1, 2, 2, 2, 2, 1, 1, 0, 0)
score
containing sum of
pointsTest1
, pointsTest2
and
pointsTest3
columns and a column average
with
the mean of these 3 columns. Hint: see functionalities of
rowSums()
, colSums()
, rowMeans()
and colMeans()
.passed
containing
TRUE
if the score
is >17 and
FALSE
otherwise. Hint: remember the theoretical part on
logical vectors.table( )
function.exam_rates2
by selecting from
the previous dataframe only people that passed the exam and columns
name
,pointsTest1
,pointsTest2
and
pointsTest3
exam_rates2
.
Inspect the structure of the dataframe. What happened?vec
by extracting value
column from the dataset exam_rates2
. Then evaluate sum,
mean, standard deviation, summary and quantiles of the vector. Hint: use
sum( )
, mean( )
, sd( )
,
summary(
) and quantile()
functions. Take some
time to inspect resultscollection
that contains
exam_rates
, exam_rates2
, vec
and
the quantiles of vec
.student_marks
by binding following
vectors as rows.Marie=c(28, 29, 30, 27, 30)
Gianni=c(18, 19, 21, 26, 28)
Silvia=c(23, 24, 26, 30, 22)
Laura=c(30,29, 30, 28, 29)
Mariachiara=c(29, 30, 28, 27, 25)
Simone=c(24, 29, 30, 22, 19)
Sarah=c(26, 28, 25, 29, 30)
Francesca=c(18, 26, 28, 19, 26)
Matteo=c(26, 28, 30, 30, 28)
CourseA
,
CourseB
, CourseC
, CourseD
,
CourseE
collection
list and name it.
Hint: remember how to concatenate lists; lists can also contain one
single elementage=c(21, 24, 22, NA, NA, 20, 23, NA, NA)
names(age)=c("Marie", "Gianni", "Silvia", "Laura", "Mariachiara", "Simone", "Sarah", "Francesca", "Matteo")
mean()
function. Try in both ways!grep
or grepl
functions extract ages
of people with “ia” in their names.info=cbind.data.frame(name=c("Marie", "Gianni", "Silvia", "Laura", "Mariachiara", "Simone", "Sarah", "Francesca", "Matteo"),
goes_to_gym=c(TRUE, TRUE, FALSE, NA, FALSE, NA, FALSE, TRUE, TRUE),
times_gym= c(3, NA,0, NA,0,NA, 0, NA, 5 ),
likes=c("ski", "snowboard", "pilates", "football", "tennis", "basketball", "tennis", NA, NA))
goes_to_gym
times_gym
likes
subset
function and conditions into
[ ]
info
that contains the following elements
(l
, hour
,station
and
mt
). Be sure to assign names to the elements.l=cbind.data.frame(Date=c("03-04", "03-05", "03-06", "03-07", "03-08", "03-09", "03-10", "03-11", "03-12"),
Temp=c(12,5,18, 20, 12, 15, 17, 15, 19))
hour=c(12, 4, 14, 11, 13, 16, 12.30, 20, 13.30 )
station=c(rep("Saluzzo(CN)", 2), rep("Montecatini(PT)", 3),"Pescia(PT)", rep("Milano(MI)", 3))
mt=rbind(c(13,12,35), c(2.6, 6.2, 9), seq(30, 50, 6.7))
Without extracting elements from the list modify them with the following steps (you cannot use the original objects but only those in the list):
Assign to hour
vector names by using values in the
Temp
column of the dataframe l
Transform station
in a factor vector fixing levels
that correspond to the alphabetical order of the cities
Add a column in l
that contains values in
station
Assign to hour
an attribute station
containing character values of station
. Notice that I said
character, not factor
Order station
vector
Add to the list another vector New
containing
Temp
values extracted matching station
and the
new column you added to the data frame (keep only the first
element)
Evaluate if in the Temp
column there are values in
the range 19:40
Extract the indexes of the elements that are equal to 15 in
hour
Create a new logical vector by evaluating in station
the cities that are not in the “PT” province. Hint: explore the
paramenters of the function you will use. Then add this vector to the
list
How many elements does your list contain?
Create a new matrix mt2
of the same dimensions as
mt
that contains mt
values if they are minor
than 10 and mt
values +1 if they are major or equal to 10.
Add this to the list.
Using only a combination of [[ ]]
and
[ ]
extract the second element of the fifth element of the
list
Extract the second column of the mt
matrix from the
list
Extract all elements, except the second, of the list
Extract the data frame contained into the list and assign it to a
variable data_frame
Subset data_frame
by keeping only data for the
stations “Milano(MI)” and “Saluzzo(CN)” and reassign it to the variable
data_frame
Concatenate data_frame
with the following dataframe
data2
and reassign the result to
data_frame
data2=cbind.data.frame(Date=c("03-07", "03-08", "03-09", "03-10", "03-11", "03-12", "03-04", "03-05", "03-06", "03-09", "03-10", "03-11", "03-12"),
Temp=c(11, 12, 14, 9, 8, 13, 14, 15, 6, 10, 18,14, 13),
station=c(rep("Milano(MI)", 6), rep("Saluzzo(CN)",7)))
table()
evaluate how many times a certain
Temp
is retrieved in each citydcast
expand the data frame using as reference
columns Date
and station
(reassign to
data_frame
)rownames
values in the column
Date
Date
from the dataframestr()
and
typeof()
Create a dataframe df1
using the vectors
samples
, genes
and
expression_counts
(binding by columns) and an other
dataframe df2
using vectors FOXA1
,
MYC
, AR
, ENSG00000281133
,
FAM138A
(binding by rows). Assign the following
colnames
to df2
: gene_name
,
gene_type
, gene_id
.
Then:
df1
colums gene_id
and
gene_type
by taking the information in
df2
protein coding
genes
from df1
cases
by grepping from df1
the string “MOD” and keeping only unique valuescontrols
by grepping from
df1
the string “NORM” and keeping only unique valuescases
vectorcontrols
vectorsamples
by concatenating
cases
and controls
annotation
to df1
in which
you have to put “case” and “control” values by matching the column
samples
in df1
with values in the vector
samples
samples=c(rep("MOD01",5),
rep("NORM01",5),
rep("NORM02", 5),
rep("MOD02", 5),
rep("MOD03", 5),
rep("NORM03", 5) )
genes=rep(c("FOXA1","AR","MYC","ENSG00000281133","FAM138A"), 6)
expression_counts=rnorm(30, 25, 4)
FOXA1=c("FOXA1", "protein_coding","ENSG00000129514.8")
AR=c("AR","protein_coding","ENSG00000169083.18")
MYC=c("MYC","protein_coding","ENSG00000136997.21")
ENSG00000281133=c("ENSG00000281133","pseudogene" ,"ENSG00000281133.1")
FAM138A=c("FAM138A", "lncRNA","ENSG00000237613.2")
get_info
that takes as input a number
and a dataframe and return as output:“LOW” if the number is lower than 10
“HIGH” if the number is greater than 40
the number of values in column expression_counts
of
df1
(previous exercise) that are lower than that number
Then:
gene_name = c("FOXA1","SRSF1","MYC","PTBP1","AR")
. Then,
use a for
loop to iterate over the elements in the vector.
For each element, print “Present” if it is found in the
genes
column of df1
, or “Not Present”
otherwise. (HINT: try to use "\n"
to write each output in a
new line)for
loop to iterate over the elements in the
gene_name
vector. If an element is not found in the
genes
column of df1
, skip to the next element.
Otherwise, select the rows corresponding to that gene and display the
average of the expression_counts
column.total<-0
. Use a for
loop to iterate over the elements in the expression_counts
column of df1
. At each iteration, add the new value to
total
. The loop should terminate once total
exceeds 150. Display the final value of total
.g
:
NaN
(use the helper in R
Studio).NA
, NaN
or infinite values.g=list(Value=c(NaN,32, NA,39, Inf, -Inf, 8.9, 4 ),
Mat=matrix(c(1:9, NA, NaN, 989:103, Inf, NA, 10^7, 9^5, 6*7, 5/3, 6+2, 5-7), ncol=6, byrow = T),
Df=cbind.data.frame(place=factor(c("Garden", "House", "Square", NA)), N=c(NA, 5, 7, NaN))
)