Power analysis is an important part of planning a statistical study. Researchers usually try to make a test as powerful as reasonable (there is always a more powerful study than the one currently employed; simply use a sample size of \(n + 1\) rather than \(n\)). A useful technique in planning a study is to decide what effect size the test should be able to detect with some specified probability, then choose a sample size that will give this property to the test.
Whole R packages are devoted to providing tools for study planning, but the stats package included with any R installation has some function for power analysis, including those for the two classes of tests we have studied: tests for population mean, and tests for population proportion.
power.t.test()
allows you to perform power analysis for the \(t\)-test. It can be used in different ways depending on which parameters are passed to it and which are set to NULL
, and you are encouraged to read the documentation (with, say, help("power.t.test")
) to see how this function behaves, but I will focus on two applications: computing power for a test, and computing a sample size for a given power.
power.t.test(n, delta, sd, sig.level, type = "some.type.of.test", alternative = "some.alternative")
will compute the power of a test with sample size n
, where the difference between the mean under the null hypothesis and the mean under the alternative hypothesis, \(\mu_0 - \mu_A\), is delta
, the population standard deviation is sd
, and the significance level is sig.level
(by default, sig.level = .05
). The type of the test administered is specified by type
, and can be either "one.sample"
, "two.sample"
, or "paired"
(the meaning should be self-evident). alternative
specifies whether the alternative hypothesis is one-sided or two-sided. Notice that if alternative = "two.sided"
, delta
will be perceived as also communicating which direction the difference between \(\mu_0\) and \(\mu_A\) occurs, so if you want the power to include the probability of rejecting in the opposite direction as well, you should set the parameter strict
to TRUE
(by default, strict = FALSE
).
Suppose you plan to conduct a study to determine whether a new drug would induce weight loss. You plan to give all study participants both the drug and a placebo (in random order, with neither the study participants or experiment staff knowing which treatment is the drug or placebo, thus helping combat bias), and measure the difference in weight loss when the two treatments are administered. Thus your hypotheses are:
\[H_0: \mu_{\text{drug}} = \mu_{\text{placebo}}\] \[H_A: \mu_{\text{drug}} > \mu_{\text{placebo}}\]
Your test will use a significance level of \(\alpha = .01\). You believe that \(\sigma = 20\) (you estimate high to be on the safe side). A researcher on staff suggests a sample size of 20. You are skeptical that a study with that sample size will be able to detect a five-pound difference in weight loss between the drug and the placebo, and thus compute \(\pi(5)\), the power of the test when the true difference is 5 lbs.
power.t.test(n = 20, delta = 5, sd = 20, sig.level = 0.01, type = "paired",
alternative = "one.sided")
##
## Paired t test power calculation
##
## n = 20
## delta = 5
## sd = 20
## sig.level = 0.01
## power = 0.09924502
## alternative = one.sided
##
## NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs
This study will only detect a five-pound difference about 10% of the time, which is too low for your liking. You want to find a sample size to guarantee detecting this difference with some higher probability. This may involve a much larger sample size.
power.t.test(delta = d, sd = s, sig.level = alpha, power = p, type = "some.type.of.test", alternative = "some.alternative")
is similar in usage to the earlier command but instead of finding power, the sample size will be found such that the power of the test when the difference between \(\mu_0\) and \(\mu_A\) is d
is power
. Thus this call is useful when planning a test and choosing an appropriate sample size for detecting a specified effect with some desired probability.
You have decided that you want the study to detect a five-pound difference in weight loss 90% of the time, and want to find a sample size that will give your test this property. You use R to find this sample size:
power.t.test(power = 0.9, delta = 5, sd = 20, sig.level = 0.01, type = "paired",
alternative = "one.sided")
##
## Paired t test power calculation
##
## n = 210.9878
## delta = 5
## sd = 20
## sig.level = 0.01
## power = 0.9
## alternative = one.sided
##
## NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs
The results suggest that you need 211 study participants for your study to have the desired property.
power.prop.test()
does for tests for population proportion what power.t.test()
does for tests for population mean. The syntax is similar, except there is no parameter type
, and delta
is replaced with p1
and p2
, which specify the population proportions under the two hypotheses. (There is no need to specify sd
, so clearly that is not a parameter either.)
Gallup polls often survey samples of 1500 adults. Suppose a Gallup poll asks individuals whether they support Hillary Clinton or Donald Trump for President, and the poll uses a significance level of \(\alpha = .05\) (the default for power.prop.test()
, thus allowing us to ignore the parameter sig.level
). Suppose we wish to use the results of the poll to test:
\[H_0: p = .5\] \[H_A: p > .5\]
where \(p\) is the proportion of the population supporting Hillary Clinton. We would like to know if the Gallup poll can reasonably detect a 1% advantage for Clinton, and use power.prop.test()
to detect this:
power.prop.test(n = 1500, p1 = .5, p2 = .51, alternative = "one.sided")
##
## Two-sample comparison of proportions power calculation
##
## n = 1500
## p1 = 0.5
## p2 = 0.51
## sig.level = 0.05
## power = 0.136286
## alternative = one.sided
##
## NOTE: n is number in *each* group
The Gallup poll will detect this difference only 14% of the time. If we wanted to detect this advantage for Clinton 95% of the time, what sample size do we need? By specifying power = .95
and omitting n
, we can find the desired sample size.
power.prop.test(power = .95, p1 = .5, p2 = .51, alternative = "one.sided")
##
## Two-sample comparison of proportions power calculation
##
## n = 54102.75
## p1 = 0.5
## p2 = 0.51
## sig.level = 0.05
## power = 0.95
## alternative = one.sided
##
## NOTE: n is number in *each* group
We would need a sample size of 54,103 people to have a test with these properties.