Driven by data; ridden with liberty.
Statistical inferences are necessarily made when moving from polling samples and study groups to the general population. Hypothesis testing is a fundamental part of statistical analysis, considering whether an observation suggests the support or rejection of a theory.
(Video: Khan Academy)
It is illustrative to consider an example of the hypothesis testing procedure in practice. There is an exam where the scores are normally distributed – as in a bell curve – with a mean average of 100 and a standard deviation of 15. A class of nine pupils takes this exam, obtaining scores of 89, 99, 105, 116, 116, 118, 119, 125 and 128. Is this class above average? The first step is state the null hypothesis that the class is average – the mean is 100. The alternative hypothesis is that the class is above the population average – the mean is greater than 100. This is an example of one-sided test. The second step is to set a criterion for rejection: the level of significance. For social and behavioural sciences, the level of significance is usually 5%. When studies say that their result was “statistically significant”, they mean that the p-value, which is the conditional probability of observing the given data or something more extreme if the null hypothesis is true, is less than 5%.
The third step is to calculate the test statistic. Now, assume the null hypothesis is correct. There are nine pupils, so the sampling distributions of the means must be normally distributed, because the exam scores are; and the mean must be 100, because that is the null hypothesis. The standard error of this distribution is equal to the standard deviation (of the exam scores) divided by the square root of the sample size, so it is 15 divided by 3. The standard error must be 5.
The nine pupils’ average is 112.8. How many standard errors is that away from our assumed mean? Punching the arithmetic, it is 2.56 standard errors away. This test statistic is then converted to the p-value. In this case, it is 0.0052, or 0.52%. The final step is simple: is this significant? Even if the level of significance was 1%, the observed data is significant because it is less than 1%. The null hypothesis is rejected, so the class is above average.
This formal procedure remains the same: state the hypotheses, state the level of significance, calculate the p-value and make the decision. The test statistics will change based upon what you are calculating. Philosophically, it is important to note that, in the case where the test statistic was not significant, the null hypothesis has failed to be rejected. If we were to say that the null hypothesis has been accepted, because the evidence was not strong enough to reject it; that would be an appeal to ignorance.
There are two things for budding statisticians to be aware: you can never be completely certain of a result, there are only levels of significance; and politicians and commentators will attempt to ignore this inherent uncertainty.