R allows to read several file format through the
read.table()
function, and those derived from it.
Suppose your file is a simple text file as
id gender score
1 M 7.5
2 F 3
3 M 9
foo = read.table(file = ’foo.txt’, header = TRUE)
If the entries are comma separated as
id,gender,score
1,M,7.5
2,F,3
3,M,9
foo = read.csv(file = ’foo.csv’, header = TRUE)
Notice: read.csv
uses period (.) as
decimal point and comma (,) as separator. If the convention is that the
decimal point is comma (,) and the semicolon (;) is used as separator,
use the function read.csv2()
.
The R devolopers suggest:
The first piece of advice is to avoid doing so if possible! If you have access to Excel, export the data you want from Excel in tab-delimited or comma-separated form, and use read.delim or read.csv to import it into R
Another (not straightfoward) way is to open the spreadsheet, mark the desired area and copy it to the clipboard. Then, for Mac users, do:
> read.delim(pipe("pbpaste"))
# read.delim(’clipboard’) for Windows user
The easiest way to do this is to use write.csv()
. By
default, write.csv()
includes row names, but these are
usually unnecessary and may cause confusion.
# A sample data frame
data <- read.table(header=TRUE, text='
subject sex size
1 M 7
2 F NA
3 F 9
4 M 11
')
# Write to a file, suppress row names
write.csv(data, "data.csv", row.names=FALSE)
# Same, except that instead of "NA", output blank cells
write.csv(data, "data.csv", row.names=FALSE, na="")
# Use tabs, suppress row names and column names
write.table(data, "data.csv", sep="\t", row.names=FALSE, col.names=FALSE)
write.csv()
andwrite.table()
are best for interoperability with other data analysis programs.
They will not preserve special attributes of the data structures, such as whether a column is a character type or factor, or the order of levels in factors.
In order to do that, it should be written out in a special format for R.
# Save in a text format that can be easily loaded in R
dump("data", "data.Rdmpd")
data1 = data
# Can save multiple objects:
dump(c("data", "data1"), "data.Rdmpd")
# To load the data again:
source("data.Rdmpd")
# When loaded, the original data names will automatically be used.
# Save a single object in binary RDS format
saveRDS(data, "data.rds")
# Or, using ASCII format
saveRDS(data, "data.rds", ascii=TRUE)
# To load the data again:
data <- readRDS("data.rds")
# Saving multiple objects in binary RData format
save(data, file="data.RData")
# Or, using ASCII format
save(data, file="data.RData", ascii=TRUE)
# Can save multiple objects
save(data, data1, file="data.RData")
# To load the data again:
load("data.RData")
An important difference between saveRDS()
and
save()
is that, with the former, when you
readRDS()
the data, you specify the name of the object, and
with the latter, when you load()
the data, the original
object names are automatically used.
A work by Matteo Cereda and Fabio Iannelli