R is a functional programming (FP) language.
R is a collection of functions;
Functions that maps some input to some output.
All R functions have three parts:
Part | Description |
---|---|
formals() |
the list of arguments which controls how you can call the function. |
body() |
the code inside the function. |
environment() |
the map of the location of the function’s variables. |
f <- function(x) { "ciao" }
f
## function(x) { "ciao" }
# the list of arguments which controls how you can call the function
formals(f)
## $x
#the code inside the function.
body(f)
## {
## "ciao"
## }
#the map of the location of the function's variables.
environment(f)
## <environment: R_GlobalEnv>
There is one exception to the rule that functions have three components.
Primitive functions call C code directly with
.Primitive()
and contain no R code (i.e.sum()
).
Therefore their formals()
, body()
, and
environment()
are all NULL
:
sum
## function (..., na.rm = FALSE) .Primitive("sum")
formals(sum)
## NULL
body(sum)
## NULL
environment(sum)
## NULL
The syntax for R functions is:
foo = function(arg1, arg2, arg3, ...){
...do something...
The last expression is returned
}
You can specify arguments or you can leave space for unspecifed parameters.
Using triple-dots
...
in argument list allows for unspecified additional arguments.
foo = function(x,y,...){
print(x)
## grep unspecified parameters
args = list(...)
if("z" %in% names(args) ){
print(args$z)
}
}
foo(5, 3, z="what's up?" )
## [1] 5
## [1] "what's up?"
Scoping (adjective) : “the act or practice of eyeing or examining, as in order to evaluate or appreciate”
The scope is the context within your computer program where a variable or an identifier can be used, or within which a declaration has effect.
Scoping is the set of rules that control the way R picks up the value of a variable.
a <- 1
The <-
operator is called a
variable assignment operator.
Given the expression a <- 1
:
the value is assigned to the variable in the current environment.
If you already had an assignment for the variable before in the same environment, this one will overwrite it.
Variable assignments only update in the current environment.
When R is looking for a value of a given variable, it will start searching from the bottom. This means the current environment is inspected first, then its enclosing environment. The search goes until either the value is found or the empty environment is reached.
R has two types of scoping:
Scoping | Meaning | Usage |
---|---|---|
Static | a variable always refers to its top level environmentant. | implemented automatically at the language level |
Dynamic | a global identifier refers to the identifier associated with the most recent environment. | To select functions and save typing during interactive analysis |
Static scoping is determined by the structure of the source code
There are four basic principles behind R’s implementation of static scoping:
name masking
functions and variables
the first start
dynamic lookup
If a name is NOT defined inside a function, R will look one level up.
f <- function() {
x <- 1
y <- 2
c(x, y)
}
f()
## [1] 1 2
rm(f)
# A __________
x <- 1
g <- function() {
# B __________
y <- 2
c(x, y)
# __________ B
}
g()
## [1] 1 2
rm(x, g)
# __________ A
The same principles apply regardless of the type of associated value : the closer level wins.
The same rules of name masking apply if a function is defined inside another function
#fisrt function __________________
l <- function(x) x + 1
#second function ________________
m <- function() {
#fist function again __________
l <- function(x) x * 2
l(10)
}
m()
#> [1] 20
rm(l, m)
What happens if you are calling an operation on value that is NOT initialized?
a <- a + 1
# Error in a + 1 : non-numeric argument to binary operator
To avoid it you can use the function exists()
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
a
}
j()
## [1] 2
rm(j)
Static scoping determines WHERE to look for values, not WHEN to look for them.
R looks for values when the function is run, not when it’s created.
This means that :
the output of a function can be different depending on objects outside its environment
# function recalling an external variable
f <- function() x
# time 0 _________
x <- 15
f()
#> [1] 15
# ... time 1 _________
x <- 20
f()
#> [1] 20
You generally want to AVOID this behaviour.
Declaration of variables is important. If you make a spelling mistake in your code, you won’t get an error.
One way to detect this problem is the findGlobals()
function from codetools. This function lists all the external
dependencies( i.e. functions and variables) of a function:
f <- function() x + 1
codetools::findGlobals(f)
#> [1] "+" "x"
R supports two additional syntaxes for calling special types of functions:
infix and replacement functions.
Functions where the function name comes in between its arguments.
All user-created infix functions must start and end with
%
. R comes with the following infix functions
predefined:
Infix | Description |
---|---|
%in% |
Matching operator |
%% |
Remainder operator |
%*% |
Matrix multiplication |
%/% |
Integer division |
%o% |
Outer product |
%x% |
Kronecker product |
The complete list of built-in infix operators that don’t need
%
is: :
, ::
, :::
,
$
, @
, ^
, *
,
/
, +
, -
,
>
,>=
, <
,
<=
, ==
, !=
, !
,
&
, &&
, |
,
||
, ~
, <-
,
<<-
To create a new operator:
`%+%` <- function(a, b) paste0(a, b)
"new" %+% " string"
#> [1] "new string"
Replacement functions have names in the form of
function_name<-
and they modify their arguments in place.
They:
typically have two arguments (x, value), although they can have more.
they must return the modified object.
For example, the following function allows you to modify the second element of a vector:
# Replacement function (two parameters)
`replace_the_second<-` <- function(x, value) {
x[2] <- value
x
}
x <- 1:10
# Call of Replacement function (one parameters)
replace_the_second(x) <- 5
x
#> [1] 1 5 3 4 5 6 7 8 9 10
When R evaluates the assignment
replace_the_second(x) <- 5
, it notices that the left
hand side of the <-
is not a simple name, so it looks
for a function named replace_the_second<-
to do the
replacement.
The
missing()
function can be used to test whether a value was specified as an argument to a function.
foo = function(x,y,...){
print(x)
## ___MISSING___ ##
if ( missing(y) ){
cat("y is not specified\n")
}else{
print(y)
args = list(...)
if('z' %in% names(args)) print(args$z)
}
}
foo(5, z="what's up?" )
Error in foo(5, z = "what's up?") : y is not specified
the
stop()
function stops execution of the current expression and executes an error action.
foo = function(x,y,...){
if (missing(y)){
# STOP
stop("y is not specified, please STOP\n")
}else{
print(x)
print(y)
args = list(...)
if("z" %in% names(args)) print(args$z)
}
}
foo(5, z="what's up?" )
Error in foo(5, z = "what's up?") : y is not specified, please STOP
geterrmessage()
function gives the last error message.
warning()
generates a warning message.
foo = function(x,y,...){
if (missing(y)){
## WARNINGS
warning("foo: y is not specified\n")
}
print(x)
args = list(...)
if("z" %in% names(args))
print(args$z)
}
foo(5, z="what's up?" )
[1] 5
[1] "what's up?"
Warning message:
In foo(5, z = "what's up?") : foo: y is not specified
warnings()
.The condition system provides a mechanism for signaling and handling unusual conditions/errors
try()
The most straightforward way is to wrap our problematic call in a
try()
block
x = list( 1, 2, -2, 'H5', 0, 10)
str(x)
for(i in x) {
print(i)
cat("log of ", i, " = ", log(i),"\n-----------\n")
}
[1] 1
log of 1 = 0
-----------
[1] 2
log of 2 = 0.6931472
-----------
[1] -2
log of -2 = NaN
-----------
[1] "H5"
Error in log(i) : non-numeric argument to mathematical function
In addition: Warning message:
In log(i) : NaNs produced
To prevent this behaviour you can use a try()
block
x = list( 1, 2, -2, 'H5', 0, 10)
str(x)
for(i in x) {
print(i)
### TRY BLOCK
try(
cat("log of ", i, " = ", log(i),"\n-----------\n")
)
}
1] 1
log of 1 = 0
-----------
[1] 2
log of 2 = 0.6931472
-----------
[1] -2
log of -2 = NaN
-----------
[1] "H5"
Error in log(i) : non-numeric argument to mathematical function
In addition: Warning message:
In log(i) : NaNs produced
[1] 0
log of 0 = -Inf
-----------
[1] 10
log of 10 = 2.302585
-----------
Errors and warnings do not halt the loop, which continue on with the rest of the input.
tryCatch()
Sometimes users want perform an operation and catch errors and warnings.
This can be solved using the tryCatch()
, which allows
you to write specific error and warning handlers.
Function | Summary |
---|---|
tryCatch() |
Evaluates the operation and return specific error and warnings. |
# declaration
foo = function(z,
## WARNING FUNCTION
warning = function(w) {
print( paste('warning:',w) );
},
## ERROR FUNCTION
error = function(e) {
print(paste('error:',e));
}
){
## TRY CATCH BLOCK
tryCatch(
{
print(paste("attempt log operation for z:",z))
return(log(z))
}
,warning = warning
,error = error )
}
#execution ---
foo(2)
## [1] "attempt log operation for z: 2"
## [1] 0.6931472
# executes & invokes the WARNING’s handler ---
foo(-2)
## [1] "attempt log operation for z: -2"
## [1] "warning: simpleWarning in log(z): NaNs produced\n"
# executes & invokes the ERROR’s handler ---
foo("H5")
## [1] "attempt log operation for z: H5"
## [1] "error: Error in log(z): non-numeric argument to mathematical function\n"
Sometimes users want substitute the return value when errors or warnings are returned.
Function | Summary |
---|---|
invokeRestart() |
Transfers control to the point where the specified restart was established + calls the restart’s handler with the arguments. |
withRestarts() |
describe the action the restart takes |
I want to calculte a log
of a values until I have a
result
How can I control warnings and error on the log, so I will be able to be calm?
tryCatch()
+invokeRestart()
+withRestarts()
foo = function(z,
## WARNING FUNCTION with restart
warning = function(w) {
print( paste('warning:',w) );
invokeRestart("correctArgForWarnings")
},
## ERROR FUNCTION with restart
error = function(e) {
print(paste('error:',e));
invokeRestart("correctArgForErrors")
}
){
## Loop is repeated until a break is specified
repeat
## 1. catch errors *********************
withRestarts(
## 2. catch warnings =================
withRestarts(
## TRY CATCH BLOCk -----------------
tryCatch(
{
print(paste("attempt log operation for z:",z))
return(log(z))
} # return break the repeat loop
,warning = warning
,error = error )
##------------------------------------
, correctArgForWarnings = function() {z <<- -z} )
##=================================
, correctArgForErrors = function() {z <<- 1})
##*********************************
}
foo(2)
## [1] "attempt log operation for z: 2"
## [1] 0.6931472
# invokes the warning’s handler
foo(-2)
## [1] "attempt log operation for z: -2"
## [1] "warning: simpleWarning in log(z): NaNs produced\n"
## [1] "attempt log operation for z: 2"
## [1] 0.6931472
# invokes the error’s handler
foo("H5")
## [1] "attempt log operation for z: H5"
## [1] "error: Error in log(z): non-numeric argument to mathematical function\n"
## [1] "attempt log operation for z: 1"
## [1] 0
Function | Summary |
---|---|
debug() |
Set, unset or query the debugging flag on a function |
browser() |
Interrupt the execution of an expression and allow the inspection of the environment. |
traceback() |
Prints the call stack of the last uncaught error. |
>debug(lsfit6)
>lsfit6(X,y)
debugging in: lsfit6(X, y)
debug at #1: {
solve.default(crossprod(X), crossprod(X, y))
}
Browse[2]> crossprod(X)
(Intercept) incidence I(incidence^2) I(incidence^3)
(Intercept) 20.000 37.7000 81.1900 191.9870
incidence 37.700 81.1900 191.9870 483.4555
I(incidence^2) 81.190 191.9870 483.4555 1270.0715
I(incidence^3) 191.987 483.4555 1270.0715 3435.4413
> undebug(lsfit6)
>foo = function(){ 1994 + "You go out and it’s on" }
>fighters = function() { print("Yeah, whatever it is"); foo()}
>fighters()
[1] "Yeah, whatever it is"
Errore in 1994 + "You go out and it’s on" :
argomento non numerico trasformato in operatore binario
>traceback()
2: foo() at #1
1: fighters()
Type of exec | Condition | Example |
---|---|---|
CONDITIONAL | if | if (cond) expr1 else expr2 |
CONDITIONAL | ifelse | ifelse( cond, expr1, expr2 ) |
REPETITIVE | for | for ( i in expr1 ) expr2 |
REPETITIVE | while | while (cond) expr |
REPETITIVE | repeat | repeat expr |
The break
statement can be used terminate
ANY loop (and it is the only way to terminate a repeat
loop).
The factorial function
n! = n *(n-1) * (n-2) . . . . . 2 *1
can be defined recursively as:
n! -> f(n) = n*f(n-1)
with f(1)=1
.
Let’s see how we can implement it:
## Recursion
fact.rec = function(n){
ifelse (n==1, 1, (n * fact.rec(n-1) ) )
}
## Iteration
fact.it = function(n){
ans = 1
for (ii in 2:n) ans = ans * ii
ans
}
A simple way to benchmark, how long does it takes?
system.time( fact.rec(100) )["elapsed"]
## elapsed
## 0.003
system.time( fact.it(100) )["elapsed"]
## elapsed
## 0.002
library(rbenchmark)
# benchmark() is a simple wrapper around system.time()
benchmark( fact.rec(15)
, fact.it(15)
, order="relative"
, replications=5000
)
## test replications elapsed relative user.self sys.self user.child sys.child
## 2 fact.it(15) 5000 0.012 1.000 0.012 0.000 0 0
## 1 fact.rec(15) 5000 0.134 11.167 0.132 0.003 0 0
Recursive version is “conceptually attractive” … iterative version less so.
Recursive version computationally more expensive. Overhead:
in time: every time a function is called;
in memory usage: Computing fact.rec(100)
requires fact.rec(99)
which requires …
fact.rec(1)
. So fact.rec(100)
can not be
completed before fact.rec(1)
is completed.
Functional is a function that takes a function as an input and returns a vector as output.
Function | Summary |
---|---|
lapply() |
Applies a function to each element of a list and returns a list |
sapply() |
Applies a function to each element of a list and returns a vector/matrix |
tapply() |
Applies a function to each element of an indexed array |
apply() |
Applies a function to margins of an array or matrix |
mapply() |
Applies a function to each element of different objects |
the simplest functional is lapply()
, which takes a
function, applies it to each element in a list, and returns the results
in the form of a list. lapply()
is the building block for
many other functionals, so it’s important to understand how it works.
Here’s a pictorial representation:
These functions are alternatives to iterations.
lapply()
makes it easier to work with lists by
eliminating much of the cliche’ associated with looping.
lapply()
is written in C for performance, but we can
obtaining the same result with a for-loop
l = list(a=1:3,b=4)
ans = vector("list",length(l))
for (ii in seq_along(l)){
ans[[ii]] = c(
length(l[[ii]])
,mean(l[[ii]])
)
}
Benchmarking
f.lapply = function(my.list){
lapply( my.list, function(x){
c(mean(x),length(x))
})
}
f.forloop = function(my.list) {
ans = vector("list",length(my.list))
for (ii in 1:length(my.list)){
ans[[ii]] = c(mean(my.list[[ii]])
,length(my.list[[ii]])
);
ans
}
}
library(rbenchmark)
set.seed(1)
N=10^(1:6)
b = vector('list', length(N))
for ( i in 1:length(N) ){
l = list( runif(N[[i]]), runif(N[[i]]) )
b[[i]] = cbind.data.frame(
"N" = N[[i]]
, benchmark(
f.lapply(l)
, f.forloop(l)
, columns=c("test", "replications", "elapsed", "relative")
, order="relative"
, replications=10000)
)
}
b = do.call ( "rbind.data.frame", b)
b
library(ggplot2)
ggplot(b, aes(x=N,y=log2(elapsed), group=test,color=test))+geom_line()+
geom_point(aes(size=elapsed), fill='white', shape=21, stroke=2)+
scale_x_log10()+
ggsci::scale_color_startrek()+
ggpubr::theme_pubr()
sapply()
and vapply()
are very similar to
lapply()
except they simplify their output to produce an
atomic vector. While sapply()
guesses,
vapply()
takes an additional argument specifying the output
type.
if you want to perform operations on two list, you have to used
mapply()
list1 = list(c('value'=1) ,c('value'=2),c('value'=3))
list2 = list('a','b','c')
z = mapply( function (x,y){
x$names=y;
x = as.data.frame(x)
return(x)
}
,x=list1
,y=list2
, SIMPLIFY = F
)
z
## [[1]]
## value names
## 1 1 a
##
## [[2]]
## value names
## 1 2 b
##
## [[3]]
## value names
## 1 3 c
z = do.call('rbind.data.frame', z)
z
## value names
## 1 1 a
## 2 2 b
## 3 3 c
A work by Matteo Cereda and Fabio Iannelli