A common problem is determining whether two distributions of two samples are the same. While statistical tests can help answer this question, visualization techniques are also quite effective. Many of the plots we have seen can be adapted to compare distributions from independent samples.
One way to do so is to create two stem-and-leaf plots back-to-back, sharing common stems but having leaves from different data sets extending out in different directions. Base R will not do this, but the stem.leaf.backback()
function in the aplpack package can create such a chart. stem.leaf.backback(x, y)
will plot the distributions of the data in vectors x
and y
with a back-to-back stem-and-leaf plot. Here we use this function to examine the distribution of tooth lengths of guinea pigs given different supplements, contained in the ToothGrowth
data set.
# library(aplpack)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# The split function can split data vectors depending on a factor. Here,
# split will split len in ToothGrowth depending on the factor variable supp,
# creating a list with the two variables we want
len_split <- with(ToothGrowth, split(len, supp))
OJ <- len_split$OJ
VC <- len_split$VC
# stem.leaf.backback(OJ, VC, rule.line = 'Sturges')
We have seen comparative boxplots before; again, they can be quite useful for comparing distributions. Calling boxplot(x, y)
with two data vectors x
and y
will compare the distributions of the data in the vectors x
and y
with a comparative boxplot.
boxplot(OJ, VC)
We can also use density estimates to compare distributions, like so:
plot(density(OJ), lty = 1)
lines(density(VC), lty = 2)