You have seen this interface before, and now we discuss it in more detail. The interface is y ~ x
, or more descriptively, response ~ predictor
. Loosely, we see the response variable as being dependent on the predictor, which could by a single variable, as in y ~ x
, or a combination of variables, as in y ~ x + z
. (+
is overloaded to have a special meaning in the model formula interface, and thus does not necessarily mean “addition”. If you wish to have +
mean literal “addition”, use the function I()
, as in y ~ I(x + z)
.) Many function in R use the formula interface, and often include additional arguments such as data
(for specifying a data frame containing the data and variables described by the formula) and subset
(used for selecting a subset of the data, according to some logical rule). Functions that use this formula interface include boxplot()
, lm()
, summary()
, and lattice plotting functions.
Below I demonstrate using boxplot()
’s formula interface for exploring the ToothGrowth
data more simply.
# Here, I plot the tooth growth data depending on supplement when dose ==
# 0.5
boxplot(len ~ supp, data = ToothGrowth, subset = dose == 0.5)
# I can create a boxplot that depends on both supplement and dosage
boxplot(len ~ supp + dose, data = ToothGrowth)
Here I compare means of tooth lengths using the formula interface in summary()
, provided in the package Hmisc.
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
# First, the result of summary
summary(len ~ supp + dose, data = ToothGrowth)
## len N= 60
##
## +-------+---+--+--------+
## | | |N |len |
## +-------+---+--+--------+
## |supp |OJ |30|20.66333|
## | |VC |30|16.96333|
## +-------+---+--+--------+
## |dose |0.5|20|10.60500|
## | |1 |20|19.73500|
## | |2 |20|26.10000|
## +-------+---+--+--------+
## |Overall| |60|18.81333|
## +-------+---+--+--------+
# A nice plot of this information (though the table is very informative; a
# plot may not be necessary)
plot(summary(len ~ supp + dose, data = ToothGrowth))