Reading files

R allows to read several file format through the read.table() function, and those derived from it.

Suppose your file is a simple text file as

id  gender  score
1 M 7.5
2 F 3
3 M 9

foo = read.table(file = ’foo.txt’, header = TRUE)

If the entries are comma separated as

id,gender,score
1,M,7.5
2,F,3
3,M,9

foo = read.csv(file = ’foo.csv’, header = TRUE)

Notice: read.csv uses period (.) as decimal point and comma (,) as separator. If the convention is that the decimal point is comma (,) and the semicolon (;) is used as separator, use the function read.csv2().

Reading Excel files

The R devolopers suggest:

The first piece of advice is to avoid doing so if possible! If you have access to Excel, export the data you want from Excel in tab-delimited or comma-separated form, and use read.delim or read.csv to import it into R

Another (not straightfoward) way is to open the spreadsheet, mark the desired area and copy it to the clipboard. Then, for Mac users, do:

> read.delim(pipe("pbpaste"))
# read.delim(’clipboard’) for Windows user

Writing to files

The easiest way to do this is to use write.csv(). By default, write.csv() includes row names, but these are usually unnecessary and may cause confusion.

# A sample data frame
data <- read.table(header=TRUE, text='
                   subject sex size
                   1   M    7
                   2   F    NA
                   3   F    9
                   4   M   11
                   ')


# Write to a file, suppress row names
write.csv(data, "data.csv", row.names=FALSE)

# Same, except that instead of "NA", output blank cells
write.csv(data, "data.csv", row.names=FALSE, na="")

# Use tabs, suppress row names and column names
write.table(data, "data.csv", sep="\t", row.names=FALSE, col.names=FALSE)

Saving in R data format

write.csv() and write.table() are best for interoperability with other data analysis programs.

They will not preserve special attributes of the data structures, such as whether a column is a character type or factor, or the order of levels in factors.

In order to do that, it should be written out in a special format for R.

  • The first method is to output R source code that will re-create the object. This should work for most data objects, but it may not be able to faithfully re-create some more complicated data objects.
# Save in a text format that can be easily loaded in R
dump("data", "data.Rdmpd")

data1 = data
# Can save multiple objects:
dump(c("data", "data1"), "data.Rdmpd")

# To load the data again:
source("data.Rdmpd")
# When loaded, the original data names will automatically be used.
  • The next method is to write out individual data objects in RDS format. This format can be binary or ASCII. Binary is more compact, while ASCII will be more efficient with version control systems like Git.
# Save a single object in binary RDS format
saveRDS(data, "data.rds")

# Or, using ASCII format
saveRDS(data, "data.rds", ascii=TRUE)

# To load the data again:
data <- readRDS("data.rds")
  • It’s also possible to save multiple objects into an single file, using the RData format.
# Saving multiple objects in binary RData format
save(data, file="data.RData")

# Or, using ASCII format
save(data, file="data.RData", ascii=TRUE)

# Can save multiple objects
save(data, data1, file="data.RData")

# To load the data again:
load("data.RData")

An important difference between saveRDS() and save() is that, with the former, when you readRDS() the data, you specify the name of the object, and with the latter, when you load() the data, the original object names are automatically used.

A work by Matteo Cereda and Fabio Iannelli