In Defence of Liberty

Driven by data; ridden with liberty.

Statistics and Lampposts IV: Averages

Sometimes, it is necessary to review the foundations. When assessing numerical data, there are several distinct ways of determining the average of that dataset. One of the most commonly-used averages is the arithmetic mean. The arithmetic mean is defined as the sum of all the numbers, dividing this total by the amount of numbers. For example, the arithmetic mean of the set – 25, 34, 46, 57, and 63 – is 45. This is because their sum is 225, and there are 5 numbers. When people say the word ‘average’, they are usually seeking the arithmetic mean.

The median is found by placing all the numbers in ascending order, and then choosing the middle value. If there are two such middle numbers, the median is defined as the mean of these two values. To re-use the above dataset, 46 is the median of those numbers. There are other averages which are less common. The mode is simply the value that occurs most often. The mid-range is the arithmetic mean of the maximum and minimum values in the dataset. The mid-range of the above dataset is the mean of 25 and 63, that is, 44.

Each measure of the average provides a new insight into the underlying dataset. If a woman has two children, the mean and median heights of those children are equal. If a baby is born to this family, the median height becomes the shorter value of the two eldest children, and the mean height is drastically reduced. It would be erroneous to conclude that the children have shrunk, even though both the mean and the median have decreased. Similarly, statistical data on income growth and income distribution should be augmented by observing labour market changes and income dynamics. A leading economic fallacy, as identified by American economist Thomas Sowell, is to confuse the fates of statistical categories with actual human beings.

Unreasonable Reasoning

The Office for National Statistics (ONS) publishes the Annual Survey of Hours and Earnings (ASHE) every year. This survey studied 1% of the Pay-As-You-Earn employees provided by HM Revenue and Customs, collecting “information on the levels, distribution and make-up of earnings and hours paid”. Crucially, the provisional results of the 2013 ASHE found that median pay – over the period of April 2012 to April 2013 – had increased by 2.2%. On the day that the ASHE was published, numerous MPs shared this news.

Dr Éoin Clarke, a blogger who is apparently read by Labour leader Ed Miliband, stated the ASHE provisional results were a “misreporting” of ONS pay growth data.

Dr Clarke even accused Grant Shapps MP of being “wrong”, reasoning the age of the data made it incorrect. The ‘correct’ growth of 0.7%, according to Dr Clarke, came from the Labour Market Statistics, published in November.

This latter data compares mean pay over the period of July to September, from one year to the next. The discrepancy in these two figures does not come from their respective age, as median pay would not be that erratic, but the fact they are two entirely different measures of pay growth. One looks at the median, and the other at the mean. Furthermore, the mean pay growth over the period February to April 2012-13 was 1.3%; or 0.9% if bonuses are excluded.

The median pay growth statistics in the ASHE are distinct from the mean average growth statistics in the Labour Market Statistics. (Photo: ONS)

The median pay growth statistics in the ASHE are distinct from the mean average growth statistics in the Labour Market Statistics. (Photo: ONS)

Neither figure is wrong: both are legitimate and informative. Simply, it was wrong of Dr Clarke to say that Conservative MPs were “misreporting” ONS data. It is astounding that Dr Clarke – who regularly claims that he is rebutting “Tory lies” with “actual truth” – has made such an elementary mistake. It shows, particularly when discussing statistics: always check the foundations.



This entry was posted on December 14, 2013 by in Statistics and tagged , .
%d bloggers like this: