R would not be very useful if we had no way of loading in and saving data. R has means for reading data from spreadsheets such as .xls
or .xlsx
files made by Microsoft Excel. Functions for reading Excel files can be found in the xlsx or gdata packages.
Common plain-text formats for reading data include the comma-separated values format (.csv
), tab-separated values format (.tsv
), and the fixed-width format (.fwf
). These files can be read in using the read.csv()
, read.table()
, and the read.fwf()
functions (with read.csv()
being merely a front-end for read.table()
). All of these functions parse a plain-text data file and return a data frame with the contents. Keep in mind that R will guess what type of data is stored in the file. Usually it makes a good guess, but this is not guaranteed and you may need to do some more data cleaning or give R more instructions on how to interpret the file.
In order to load a file, you must specify the location of the file. If the file is on your hard drive, there are a few ways to do so:
You could use the file.choose()
command to browse your system and locate the file. Once done, you will have a text string describing the location of the file on your system.
Any R session has a working directory, which is where R looks first for files. You can see the current working directory with getwd()
, and change the working directory with setwd(path)
, where path
is a string for the location of the directory you wish to set as the new working directory.
Let’s assume we’re loading in a .csv
file (the approach is similar for other formats). The command df <- read.csv("myfile.csv")
instructs R to read myfile.csv
(which is presumably in the working directory, since we did not specify a full path; if it were not, we would either change the working directory or pass the full path to the function, which may look something like read.csv("C:/path/to/myfile.csv")
, or read.csv("/path/to/myfile.csv")
, depending on the system) and store the resulting data frame in df
. Once done, df
will now be ready for us to use.
Suppose that the data file is on the Internet. You can pass the url of the file to read.csv()
and R will read the file online and make it available to you in your session. I demonstrate below:
# Total Primary Energy Consumption by country and region, for years 1980
# through 2008; in Quadrillion Btu (CSV Version). Dataset from data.gov,
# from the Department of Energy's dataset on total primary energy
# consumption. Download and load in the dataset
energy <- read.csv("http://en.openei.org/doe-opendata/dataset/d9cd39c5-492e-4e82-8765-12e0657eeb4e/resource/3c42d852-567e-4dda-a39c-2bfadf309da5/download/totalprimaryenergyconsumption.csv",
stringsAsFactors = FALSE)
# R did not parse everything correctly; turn some variables numeric
energy[2:30] <- lapply(energy[2:30], as.numeric)
# We want energy data for North American countries, from 2000 to 2008
us_energy <- subset(energy, select = X2000:X2008, subset = Country %in% c("Canada",
"United States", "Mexico"))
us_energy
## X2000 X2001 X2002 X2003 X2004 X2005 X2006
## 2 13.07669 12.87847 13.10786 13.52061 13.83128 14.16374 13.81736
## 4 6.37958 6.32931 6.32936 6.50563 6.48998 6.80188 7.36271
## 6 99.25385 96.53415 98.03879 98.31384 100.49743 100.60722 99.90566
## X2007 X2008
## 2 14.07179 14.02923
## 4 7.27651 7.30898
## 6 101.67563 99.53011
Naturally you can export data frames into common formats as well. write.csv()
, write.table()
, and write.fwf()
will write data into comma-separated value, tab-separated value, and fixed width formats. Their syntax is similar. To save a .csv
file, issue the command write.csv(df, file = "myfile.csv")
, where df
is the data frame to save and file
where to save it, which could be just a file name (resulting in the file being saved in the working directory), or an absolute path.
my_data <- data.frame(var1 = 1:10, var2 = paste("word", 1:10))
write.csv(my_data, file="my_data.csv")
There are other formats R can read and write to. The foreign package allows R to read data files created for other statistical software packages such as SAS or Stata. The XML package allows R to read XML and HTML files. You can also read JSON files or data stored in Google Sheets. Refer to the textbook for more information.