Statistical inference’s raison d’etre is detecting signal from noise. More precisely, statistical inference is used to determine if there is enough evidence to detect an effect or determine the location of a parameter in the presence of “noise”, or random effects.
Suppose a researcher is trying to determine if a new drug under study lowers blood pressure. The researcher randomly assigns study participants to control and treatment groups. He stores his results in R vectors like so:
control <- c(124, 155, 120, 116)
treatment <- c(120, 108, 132, 112)
Is there a difference beteen control and treatment? Let’s check.
mean(treatment) - mean(control)
## [1] -10.75
A difference of -10.75 looks promising, but there is lots of variation in the data. Is a difference of -10.75 large enough to conclude there is a difference?
Suppose not. Let’s assume that the treatment had no effect, and the observed difference is due only to the random assignment to control and treatment groups. If that were the case, other random assignment schemes would have differences just as or even more extreme than the one observed.
Let’s investigate this possibility. We will use the combn()
function to find all possible combinations of assigning individuals to the treatment group, with the rest going to the control group. combn(vec, n)
will find all ways to choose n
elements from vec
, storing the result in a matrix with each column representing one possible combination. We will assume that we will be pooling the control and treatment groups into a single vector with 8 elements.
# Find all ways to choose the four indices that will represent a new random
# assignment to the treatment group. This is all we need to know to
# determine who is assigned to both control and treatment.
assn <- combn(1:(length(control) + length(treatment)), 4)
# Look at the result
assn[, 1:4]
## [,1] [,2] [,3] [,4]
## [1,] 1 1 1 1
## [2,] 2 2 2 2
## [3,] 3 3 3 3
## [4,] 4 5 6 7
# How many combinations are possible?
ncol(assn)
## [1] 70
Now, let’s actually investigate the difference and determine if it is “large”.
# Lump all data into a single vector
blood_pressure <- c(control, treatment)
blood_pressure
## [1] 124 155 120 116 120 108 132 112
# To demonstrate the idea of this analysis, let's consider just one new
# assignment of control and treatment, using the 4th column of assn. The new
# assignment is in ind.
ind <- assn[, 4]
# If blood_pressure[ind] is the treatment group, blood_pressure[-ind] is the
# control group.
blood_pressure[ind]
## [1] 124 155 120 132
blood_pressure[-ind]
## [1] 116 120 108 112
# What is the new difference?
mean(blood_pressure[ind]) - mean(blood_pressure[-ind])
## [1] 18.75
# Now, let's do this for all possible combinations, using apply
diffs <- apply(assn, 2, function(ind) {
mean(blood_pressure[ind]) - mean(blood_pressure[-ind])
})
Now we can decide how “unusual” our initial difference between treatment and control of -10.75 is. Since we are trying to determine if the treatment reduces blood pressure, we decide that if this particular assignment is unusually negative, there would be evidence that the treatment works. So we see how many assignments have means more negative than the one seen.
sum(diffs < mean(treatment) - mean(control))
## [1] 11
mean(diffs < mean(treatment) - mean(control))
## [1] 0.1571429
It seems that 0.16% of assigments to control and treatment result in differences more “extreme” than the one observed. This is not convincing evidence that the difference we saw reflects any significant effect from the new drug; there is too much noise in the data to reach such a conclusion.
Let’s consider another problem. In the ToothGrowth
data set, we can see the results of an experiment where different supplements were used to examine the tooth growth of guinea pigs. Is there a difference?
Let’s first see what the difference is.
# Find the means
supp_means <- aggregate(len ~ supp, data = ToothGrowth, mean)
supp_means
## supp len
## 1 OJ 20.66333
## 2 VC 16.96333
# The difference in means (VC - OJ)
diff(supp_means$len)
## [1] -3.7
There is a difference; it appears that the supplement VC
(vitamin C in absorbic acid) has smaller tooth growth than the supplement OJ
(orange juice). But is this evidence significant?
In this case, our earlier trick will not work. There are 60 observations in ToothGrowth
, 30 of which are for the group OJ
. This means that there are \({60 \choose 30} =\) choose(60, 30)
\(= 118,264,581,564,861,152\) possible random assignments to the two groups. R will surely choke on that much computation, so we must use a new approach.
Rather than examine all possible permutations, let’s randomly select permutations and see how often in our random sample we get results more extreme than what was observed. The principle is the same as what was done before, but it’s probabilistic rather than exhaustive.
# Let's randomly assign to VC supplement just once for proof of concept;
# each sample has 30 observations, so we will randomly select 30 to be the
# new VC group
ind <- sample(1:60, size = 30)
with(ToothGrowth, mean(len[ind]) - mean(len[-ind]))
## [1] 0.7066667
# We will now do this many times
sim_diffs <- replicate(2000, {
ind <- sample(1:60, size = 30)
with(ToothGrowth, mean(len[ind]) - mean(len[-ind]))
})
# Proportion with bigger difference
mean(sim_diffs < diff(supp_means$len))
## [1] 0.0325
3.25% of means are “more extreme” than what was observed, which seems convincing evidence that VC
is, on average, less than OJ
.