R allows to read several file format througth the
read.table()
function, and those derived from it. Suppose
your file is a simple text file:
# DATA_DIR = "/Users/tbecchi/Desktop/repository/BDSB/"
file=read.table(paste0(DATA_DIR,"/Support_files/prova.txt"), header = T) # notice that header=T was used, meaning that the first line is considered as header
file
## id gender score
## 1 1 M 7.5
## 2 2 F 3.0
## 3 3 M 9.0
If the entries are comma separated (csv file) an appropriate function must be used:
file=read.csv(paste0(DATA_DIR,"/Support_files/prova.csv"), header = T)
file
## id gender score
## 1 1 M 7.5
## 2 2 F 3.0
## 3 3 M 9.0
Notice: read.csv
uses period (.) as decimal
point and comma (,) as separator. If the convention uses a
comma (,) as the decimal point and a semicolon (;) as the separator, use
the read.csv2()
function.
Both read.table( )
and read.csv( )
contain
options to skip a certain number of lines at the
beginning of your file or to read only some lines of
it:
read.table(filename , skip=Number_Rows_You_Want_To_Skip)
read.table(filename , nrow=Number_Rows_You_Want_To_Read)
read.table(paste0(DATA_DIR,"/Support_files/prova.txt"), header = F, skip = 1 ) # Skip the first line at the beginning of the file
## V1 V2 V3
## 1 1 M 7.5
## 2 2 F 3.0
## 3 3 M 9.0
read.csv(paste0(DATA_DIR,"/Support_files/prova.csv"), header = F, skip = 1, nrows = 2) # Skip the first line at the beginning of the file, then read only the following two lines
## V1 V2 V3
## 1 1 M 7.5
## 2 2 F 3.0
If you have .xlsx
files, Excel 2007 and later
spreadsheets, you can use the xlsx package
, that
requires Java.
library(xlsx)
file=read.xlsx(paste0(DATA_DIR,"/Support_files/prova.xlsx"), sheetIndex = 1)
file
## id gender score
## 1 1 M 7.5
## 2 2 F 3.0
## 3 3 M 9.0
Regarding pre-Excel 2007 spreadsheets (in xls
format)
the R developers recommend:
If you have access to Excel, export the data you want from Excel in
tab-delimited or comma-separated form, and use read.delim
or read.csv
to import it into R.
R allows you to save the data frames you have worked with as text
files.
The easiest way to do this is to use
write.csv()
. By default, write.csv()
includes
row names, but these are usually unnecessary and may cause confusion, so
we often have to add row.names=F
. If we do not need column
names we can also remove them using col.names=F
as
option.
write.csv(file,paste0(DATA_DIR,"/Support_files/prova.csv"), row.names = F)
We can save data in .txt
format using
write.table()
by specifying the
separator:
write.table(file,paste0(DATA_DIR,"/Support_files/prova.txt"), row.names = F, sep="\t") ## tab separator
write.csv()
and write.table()
are
ideal for interoperability with other data analysis
programs.
However, they will not preserve special attributes of the data structures, such as the column’s data type (character or factor) or the order of levels in factors.
To achieve this, the data should be saved in a format specific to R.
R-specific formats are:
RDS. In default mode RDS format is binary.
Functions to read and write RDS files are readRDS()
and
saveRDS()
respectively
RData. In this case you can save multiple R
objects in an unique file. Functions to read and write RData files are
save()
and load()
respectively
An important difference between
saveRDS()
and save()
is that, with the former,
when you readRDS()
the data, you specify the
name of the object, and with the latter, when you
load()
the data, the original object names are
automatically used.
df=data.frame(day=c("Yesterday", "Today", "Tomorrow"), min_T=c(5,8,10), max_T=c(8, 15, 12), fog=c(TRUE, FALSE, TRUE))
l=list(weather=c("sunny", "sunny", "windy", "foggy"), temperature=c(12,13,10,6))
# RDS format
saveRDS(df, paste0(DATA_DIR,"/Support_files/prova2.RDS"))
df=readRDS(paste0(DATA_DIR,"/Support_files/prova2.RDS"))
df
## day min_T max_T fog
## 1 Yesterday 5 8 TRUE
## 2 Today 8 15 FALSE
## 3 Tomorrow 10 12 TRUE
# Rdata format
save(df, l, file=paste0(DATA_DIR,"/Support_files/prova3.Rdata"))
ld=load(paste0(DATA_DIR,"/Support_files/prova3.Rdata"))
ld # We have loaded these two objects into our R environment and can access them by their names.
## [1] "df" "l"
df
## day min_T max_T fog
## 1 Yesterday 5 8 TRUE
## 2 Today 8 15 FALSE
## 3 Tomorrow 10 12 TRUE
l
## $weather
## [1] "sunny" "sunny" "windy" "foggy"
##
## $temperature
## [1] 12 13 10 6