2025-03-05

Reading files

R allows to read several file format througth the read.table() function, and those derived from it. Suppose your file is a simple text file:

# DATA_DIR = "/Users/tbecchi/Desktop/repository/BDSB/"
file=read.table(paste0(DATA_DIR,"/Support_files/prova.txt"), header = T) # notice that header=T was used, meaning that the first line is considered as header
file
##   id gender score
## 1  1      M   7.5
## 2  2      F   3.0
## 3  3      M   9.0


If the entries are comma separated (csv file) an appropriate function must be used:

file=read.csv(paste0(DATA_DIR,"/Support_files/prova.csv"), header = T)
file
##   id gender score
## 1  1      M   7.5
## 2  2      F   3.0
## 3  3      M   9.0

Notice: read.csv uses period (.) as decimal point and comma (,) as separator. If the convention uses a comma (,) as the decimal point and a semicolon (;) as the separator, use the read.csv2() function.


Both read.table( ) and read.csv( ) contain options to skip a certain number of lines at the beginning of your file or to read only some lines of it:

  • read.table(filename , skip=Number_Rows_You_Want_To_Skip)
  • read.table(filename , nrow=Number_Rows_You_Want_To_Read)
read.table(paste0(DATA_DIR,"/Support_files/prova.txt"), header = F, skip = 1 ) # Skip the first line at the beginning of the file
##   V1 V2  V3
## 1  1  M 7.5
## 2  2  F 3.0
## 3  3  M 9.0
read.csv(paste0(DATA_DIR,"/Support_files/prova.csv"), header = F, skip = 1, nrows = 2) # Skip the first line at the beginning of the file, then read only the following two lines
##   V1 V2  V3
## 1  1  M 7.5
## 2  2  F 3.0



Reading Excel files

If you have .xlsx files, Excel 2007 and later spreadsheets, you can use the xlsx package, that requires Java.

library(xlsx)
file=read.xlsx(paste0(DATA_DIR,"/Support_files/prova.xlsx"), sheetIndex = 1)
file
##   id gender score
## 1  1      M   7.5
## 2  2      F   3.0
## 3  3      M   9.0

Regarding pre-Excel 2007 spreadsheets (in xls format) the R developers recommend:

If you have access to Excel, export the data you want from Excel in tab-delimited or comma-separated form, and use read.delim or read.csv to import it into R.



Writing to file

R allows you to save the data frames you have worked with as text files.
The easiest way to do this is to use write.csv(). By default, write.csv() includes row names, but these are usually unnecessary and may cause confusion, so we often have to add row.names=F. If we do not need column names we can also remove them using col.names=F as option.

write.csv(file,paste0(DATA_DIR,"/Support_files/prova.csv"), row.names = F) 

We can save data in .txt format using write.table() by specifying the separator:

write.table(file,paste0(DATA_DIR,"/Support_files/prova.txt"), row.names = F, sep="\t") ## tab separator



Save in R-specific formats

write.csv() and write.table() are ideal for interoperability with other data analysis programs.

However, they will not preserve special attributes of the data structures, such as the column’s data type (character or factor) or the order of levels in factors.

To achieve this, the data should be saved in a format specific to R.

R-specific formats are:

  • RDS. In default mode RDS format is binary. Functions to read and write RDS files are readRDS() and saveRDS() respectively

  • RData. In this case you can save multiple R objects in an unique file. Functions to read and write RData files are save() and load() respectively

An important difference between saveRDS() and save() is that, with the former, when you readRDS() the data, you specify the name of the object, and with the latter, when you load() the data, the original object names are automatically used.

df=data.frame(day=c("Yesterday", "Today", "Tomorrow"), min_T=c(5,8,10), max_T=c(8, 15, 12), fog=c(TRUE, FALSE, TRUE))
l=list(weather=c("sunny", "sunny", "windy", "foggy"), temperature=c(12,13,10,6))

# RDS format
saveRDS(df, paste0(DATA_DIR,"/Support_files/prova2.RDS"))
df=readRDS(paste0(DATA_DIR,"/Support_files/prova2.RDS"))
df
##         day min_T max_T   fog
## 1 Yesterday     5     8  TRUE
## 2     Today     8    15 FALSE
## 3  Tomorrow    10    12  TRUE
# Rdata format
save(df, l, file=paste0(DATA_DIR,"/Support_files/prova3.Rdata"))
ld=load(paste0(DATA_DIR,"/Support_files/prova3.Rdata"))
ld # We have loaded these two objects into our R environment and can access them by their names.
## [1] "df" "l"
df
##         day min_T max_T   fog
## 1 Yesterday     5     8  TRUE
## 2     Today     8    15 FALSE
## 3  Tomorrow    10    12  TRUE
l
## $weather
## [1] "sunny" "sunny" "windy" "foggy"
## 
## $temperature
## [1] 12 13 10  6