Here we’ll explore the TIBBLE package
suppressWarnings(library(tidyverse))
“modern reimagining of the data.frame […] that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.”
Most other R packages use regular data frames, so you might want to
coerce a data frame to a tibble. You can do that with
as_tibble()
:
as_tibble(iris)
## # A tibble: 150 × 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # … with 140 more rows
You can create a new tibble from individual vectors with
tibble()
.
tibble()
will automatically recycle inputs of length 1,
and allows you to refer to variables that you just created, as shown
below.
tibble(
x = 1:5,
y = 1,
z = x ^ 2 + y
)
## # A tibble: 5 × 3
## x y z
## <int> <dbl> <dbl>
## 1 1 1 2
## 2 2 1 5
## 3 3 1 10
## 4 4 1 17
## 5 5 1 26
If you’re already familiar with data.frame()
,
tibble()
does much less:
it never changes the type of the inputs (e.g. it never converts strings to factors!);
it never changes the names of variables;
it never creates row names.
tb <- tibble(
`:)` = "smile",
` ` = "space",
`2000` = "number"
)
tb
## # A tibble: 1 × 3
## `:)` ` ` `2000`
## <chr> <chr> <chr>
## 1 smile space number
There are two main differences in the usage of a
tibble()
vs. a classic data.frame()
:
printing and subsetting.
Tibbles have a prefined print method that shows only the first 10 rows, and all the columns that fit on screen.
In addition to its name, each column reports its
type, a nice feature borrowed from str()
:
tibble(
a = lubridate::now() + runif(1e3) * 86400,
b = lubridate::today() + runif(1e3) * 30,
c = 1:1e3,
d = runif(1e3),
e = sample(letters, 1e3, replace = TRUE)
)
## # A tibble: 1,000 × 5
## a b c d e
## <dttm> <date> <int> <dbl> <chr>
## 1 2023-02-10 05:10:22 2023-02-28 1 0.236 a
## 2 2023-02-10 10:32:21 2023-02-16 2 0.958 o
## 3 2023-02-10 00:29:15 2023-02-16 3 0.0447 o
## 4 2023-02-10 03:28:00 2023-02-15 4 0.981 i
## 5 2023-02-09 19:10:10 2023-03-03 5 0.295 u
## 6 2023-02-10 04:33:31 2023-02-13 6 0.289 p
## 7 2023-02-09 22:48:00 2023-02-15 7 0.352 b
## 8 2023-02-10 07:10:40 2023-02-15 8 0.502 l
## 9 2023-02-10 06:04:49 2023-03-01 9 0.776 l
## 10 2023-02-10 13:25:36 2023-03-07 10 0.565 i
## # … with 990 more rows
First, you can explicitly print()
the data frame and
control the number of rows (n
) and the width
of the display. width = Inf
will display all columns:
nycflights13::flights %>%
print(n = 10, width = Inf)
You can control the default print behaviour by setting options:
options(tibble.print_max = n, tibble.print_min = m)
:
if more than m
rows, print only n
rows. Use
options(dplyr.print_min = Inf)
to always show all
rows.
Use options(tibble.width = Inf)
to always print all
columns, regardless of the width of the screen.
If you want to pull out a single variable, you need some new tools,
$
and [[
.
[[
can extract by name or position; $
only
extracts by name but is a little less typing.
df <- tibble(
x = runif(5),
y = rnorm(5)
)
# Extract by name
df$x
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
df[["x"]]
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
# Extract by position
df[[1]]
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
To use these in a pipe, you’ll need to use the special placeholder
.
:
# Extract in pipe
df %>% .$x
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
df %>% .[["x"]]
## [1] 0.6065414 0.6835919 0.9006848 0.3854084 0.9889426
Compared to a data.frame
,
tibbles are more strict: they never do partial matching
# Partial matching
d = data.frame( alpha= runif(10), beta=runif(10))
d$al
## [1] 0.6276676 0.3800857 0.2967158 0.8523114 0.5388956 0.8361498 0.8964996 0.8116371
## [9] 0.3043738 0.2367873
# surly tibble
t = tibble( alpha= runif(10), beta=runif(10))
t$al
## Warning: Unknown or uninitialised column: `al`.
## NULL
and they will generate a warning if the column you are trying to access does not exist.
Some older functions don’t work with tibbles. If you encounter one of
these functions, use as.data.frame()
to turn a tibble back
to a data.frame
:
class(as.data.frame(tb))
## [1] "data.frame"
A work by Matteo Cereda and Fabio Iannelli