Hypothesis Testing Basics

Hypothesis testing is another common form of statistical inference. In hypothesis testing, our objective is to decide whether we have enough evidence to reject the null hypothesis, a statement about the population distribution that a priori we believe to be true, in favor of the alternative hypothesis, a statement about the population disagreeing with the null hypothesis. Usually the null hypothesis is denoted by \(H_0\) and the alternative by \(H_A\).

The first statistical tests introduced to students are tests about the value of a population parameter (other tests are possible, though). These tests generally take the form:

\[H_0: \theta = \theta_o\] \[H_A: \left\{\begin{array}{l} \theta < \theta_0 \\ \theta \neq \theta_0 \\ \theta > \theta_0 \end{array}\right.\]

These are the tests that I discuss.

In statistical testing, one computes a test statistic used to compute a p-value, which represents the probability of observing a test statistic at least as “extreme” as the one actually observed if \(H_0\) were in fact true. Very small p-values indicate a false null hypothesis. Usually what constitutes a “small” p-value is decided beforehand by choosing a level of significance, typically denoted \(\alpha\). If the p-value is less than \(\alpha\), \(H_0\) is rejected in favor of \(H_A\); otherwise, we fail to reject \(H_0\). Common \(\alpha\) include 0.05, 0.01, 0.001, and 0.1. The smaller \(\alpha\), the more difficult it is to reject \(H_0\).

In hypothesis testing, we need to be aware of two types of errors, called Type I and Type II errors. A Type I error is rejecting \(H_0\) when \(H_0\) is true. A Type II error is failing to reject \(H_0\) when \(H_A\) is true. For any test, we want to know the probability of making either type of error. The probability of a Type I error is \(\alpha\), the level of significance; this means we specify beforehand what Type I error we want. Type II errors are much more complicated, since they depend not only on what the true value of \(\theta\) is (which we will call \(\theta_A\), the value of \(\theta\) under the alternative assumed true for Type II error analysis) but additionally on the specified \(\alpha\) and sample size \(n\) (other parameters may be involved as well, but they are assumed constant and unable to be changed). For any testing scheme we represent the Type II error with \(\beta(\theta_A)\), the probability of failing to reject \(H_0\) when the true value of \(\theta\) is \(\theta_A\). Generally \(\beta(\theta_A)\) is large when \(\theta_A\) is close to \(\theta_0\) (in fact, \(\beta(\theta_0) = 1 - \alpha\)), and small when \(\theta_A\) is distant from \(\theta_0\). This should make intuitive sense; a big difference between \(\theta_0\) and \(\theta_A\) should be easy to detect, but a small difference may be more difficult. In practice, researchers pick a \(\theta_A\) for which they want to be able to detect a difference with some specified probability \(\beta\). They then use this to pick a sample size that gives the test the desired property.

Some prefer to discuss the power of a test rather than the probability of a Type II error. The power of a test is the probability of rejecting \(H_0\) when \(\theta = \theta_A\). Power connects both Type I and Type II errors, since the power of a test is defined to be \(\pi(\theta_A) = 1 - \beta(\theta_A)\) and \(\pi(\theta_0) = \alpha\). The same principles discussed with regard to the probability of Type I and Type II errors hold with power.

As with confidence intervals, R has many functions for handling hypothesis testing (in fact, you have already seen most of them). We can perform hypothesis testing “by hand” (not using any functions designed for performing an entire test), or using functions designed for hypothesis testing. We start with methods “by hand”.