Model Formula Interface

You have seen this interface before, and now we discuss it in more detail. The interface is y ~ x, or more descriptively, response ~ predictor. Loosely, we see the response variable as being dependent on the predictor, which could by a single variable, as in y ~ x, or a combination of variables, as in y ~ x + z. (+ is overloaded to have a special meaning in the model formula interface, and thus does not necessarily mean “addition”. If you wish to have + mean literal “addition”, use the function I(), as in y ~ I(x + z).) Many function in R use the formula interface, and often include additional arguments such as data (for specifying a data frame containing the data and variables described by the formula) and subset (used for selecting a subset of the data, according to some logical rule). Functions that use this formula interface include boxplot(), lm(), summary(), and lattice plotting functions.

Below I demonstrate using boxplot()’s formula interface for exploring the ToothGrowth data more simply.

# Here, I plot the tooth growth data depending on supplement when dose ==
# 0.5
boxplot(len ~ supp, data = ToothGrowth, subset = dose == 0.5)

# I can create a boxplot that depends on both supplement and dosage
boxplot(len ~ supp + dose, data = ToothGrowth)

Here I compare means of tooth lengths using the formula interface in summary(), provided in the package Hmisc.

library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
# First, the result of summary
summary(len ~ supp + dose, data = ToothGrowth)
## len     N= 60 
## 
## +-------+---+--+--------+
## |       |   |N |len     |
## +-------+---+--+--------+
## |supp   |OJ |30|20.66333|
## |       |VC |30|16.96333|
## +-------+---+--+--------+
## |dose   |0.5|20|10.60500|
## |       |1  |20|19.73500|
## |       |2  |20|26.10000|
## +-------+---+--+--------+
## |Overall|   |60|18.81333|
## +-------+---+--+--------+
# A nice plot of this information (though the table is very informative; a
# plot may not be necessary)
plot(summary(len ~ supp + dose, data = ToothGrowth))