Measures of spread

By looking at how data in a set is distributed, we can see how much the data varies. This helps us understand the data better and make good decisions based on it. Use this resource to learn about the measures of spread.

The measure of spread shows how spread out the numbers in a set of data are. It tells us if the data values are close together or far apart.

Four targets showing different spread of data.

Here, we will look at three measures of spread:

range
interquartile range
standard deviation.

Range

The most basic measure of spread is the range. This is the distance from the smallest to the largest value.

Let's consider two datasets:

\(4,4,5,5,5,6,6\)
\(1,3,4,5,6,7,9\)

Both sets A and B have mean = median = \(5\), but the data sets are quite different. The values in set A are less spread out than those in set B.

For set A:
\[\begin{align*} \textrm{Range} & = \textrm{Highest value}-\textrm{Lowest value}\\
& = 6-4\\
& = 2
\end{align*}\]

For set B:
\[\begin{align*} \textrm{Range} & = \textrm{Highest value}-\textrm{Lowest value}\\
& = 9-1\\
& = 8
\end{align*}\]

We can see that set B has greater spread than set A. But the problem with the range is that it uses only two of the values in the dataset. One of these may be an odd or unusual value called an outlier.

Consider the two sets of values below:

\(1,1,2,2,2,2,2,100\)
\(1,18,23,41,59,63,87,100\)

The range for both is \(99\) because set Y has one unusual value, the \(100\). Most of the data are ones or twos. The value of \(100\) is unusual and is an outlier. This skews the range.

Interquartile range

The interquartile range (IQR) is the distance between the first quartile \(\textrm{Q}_{1}\) and the third quartile \(\textrm{Q}_{3}\):
\[\textrm{IQR}=\textrm{Q}_{3}-\textrm{Q}_{1}\]

The first and third quartiles are values that are \(\dfrac{1}{4}\) and \(\dfrac{3}{4}\) of the way through the ordered data.

\(\textrm{Q}_{1}\) is the median of the lower half of the data and \(\textrm{Q}_{3}\) is the median of the upper half of the data.

Let's consider sets Y and Z again. When we order the data, we can identify the median of each set, indicated by the arrow. This will cut the data into a lower half and an upper half.

For set Y:
\[1\quad1\quad2\quad2\downarrow\quad2\quad2\quad2\quad100\]

We can then identify the median of each half, which will be \(\textrm{Q}_{1}\) and \(\textrm{m}Q_{3}\).
\[1\quad\underbrace{1\quad\downarrow\quad2}_{\textrm{Q}_{1}}\quad2\quad\downarrow\quad2\quad\underbrace{2\quad\downarrow\quad2}_{\textrm{Q}_{3}}\quad100\]

We can calculate the medians.
\[\begin{align*} \textrm{Q}_{1} & = \frac{1+2}{2}\\
& = 1.5
\end{align*}\] \[\begin{align*} \textrm{Q}_{3} & = \frac{2+2}{2}\\
& = 2
\end{align*}\]

The interquartile range is:
\[\begin{align*} \textrm{IQR} & = \textrm{Q}_{3}-\textrm{Q}_{1}\\
& = 2-1.5\\
& = 0.5
\end{align*}\]

Let's find the IQR for set Z.
\[\quad1\quad\underbrace{18\quad\downarrow\quad23}_{\textrm{Q}_{1}}\quad41\quad\downarrow\quad59\quad\underbrace{63\quad\downarrow\quad87}_{\textrm{Q}_{3}}\quad100\]

We have:
\[\begin{align*} \textrm{Q}_{1} & = \frac{18+23}{2}\\
& = 20.5
\end{align*}\] \[\begin{align*} \textrm{Q}_{3} & = \frac{63+87}{2}\\
& = 75
\end{align*}\]

The interquartile range is:
\[\begin{align*} \textrm{IQR} & = \textrm{Q}_{3}-\textrm{Q}_{1}\\
& = 75-20.5\\
& = 54.5
\end{align*}\]

The IQR for set Z is much larger than for set Y, which reflects the data more accurately because it ignores extreme values (i.e. it excludes outliers). For these two datasets, the mean and IQR are better summaries of the data than the range.

Standard deviation

Standard deviation (SD) gives the most comprehensive measure of spread. It measures how data points are spread out from the mean, rather than focusing on the span or central portion of the data like range and IQR do, respectively.

It takes into account all of the data by giving an indication of the typical or average distance of each score from the mean for the data.

Standard deviation, denoted by \(\sigma\) for population data and \(s\) for sample data, can be calculated using the formulas:
\[s=\sqrt{\frac{\sum(x-\overline{x})^{2}}{n-1}}\quad\textrm{or}\quad\sigma=\sqrt{\frac{\sum(X-\overline{x})^{2}}{N}}\]

where \(n\) is the number of values in the sample (or sample number) and \(N\) is the number of values in the population.

It is usually much more convenient to use your calculator or a computer.

Some statistical tests use a measure of spread called variance which is the square of the standard deviation:
\[\begin{align*} \textrm{Variance} & = s^{2}\\
& = \frac{\sum(x-\overline{x})^{2}}{n-1}
\end{align*}\]

Let's calculate sample standard deviation for sets A and B. For set A:
\[4\quad4\quad5\quad5\quad5\quad6\quad6\]
\[\begin{align*} \overline{x} & = \frac{4+4+5+5+5+6+6}{7}\\
& = 5
\end{align*}\]

\[\begin{align*} s & = \sqrt{\frac{\sum(x-\overline{x})^{2}}{n-1}}\\
& = \sqrt{\frac{(4-5)^{2}+(4-5)^{2}+(5-5)^{2}+(5-5)^{2}+(5-5)^{2}+(6-5)^{2}+(6-5)^{2}}{7-1}}\\
& = \sqrt{\frac{1+1+0+0+0+1+1}{6}}\\
& = \sqrt{\frac{4}{6}}\\
& = 0.82
\end{align*}\]

For set B:
\[1\quad3\quad4\quad5\quad6\quad7\quad9\]
\[\begin{align*} \overline{x} & = \frac{1+3+4+5+6+7+9}{7}\\
& = 5
\end{align*}\]

\[\begin{align*} s & = \sqrt{\frac{\sum(x-\overline{x})^{2}}{n-1}}\\
& = \sqrt{\frac{(1-5)^{2}+(3-5)^{2}+(4-5)^{2}+(5-5)^{2}+(6-5)^{2}+(7-5)^{2}+(9-5)^{2}}{7-1}}\\
& = \sqrt{\frac{16+4+1+0+1+4+16}{6}}\\
& = \sqrt{\frac{42}{6}}\\
& = \sqrt{7}\\
& = 2.65
\end{align*}\]

The scores in set A are typically \(0.82\) away from the mean, but the scores in set B are typically \(2.65\) away from the mean. Although set A and B have the same average value, data points in set B are clearly more dispersed or have greater spread than set A.

Exercise – calculating measures of spread

Find the standard deviation for the following dataset: \(12,12,13,14,14,15,15,15,16,\).
A class of \(22\) students gained the following scores, out of \(10\), on a test: \(5,7,8,7,6,5,6,4,7,4,8,3,7,9,4,9,7,3,6,8,7,5\). Find the:
1. range
2. IQR
3. standard deviation.
Pistol Pete is the star full-forward for the local football team. Last season he played \(20\) games and kicked the following number of goals in each game: \(5,6,6,5,7,4,3,1,3,8,7,8,6,0,5,2,7,6,5,6\).

Find the mean and the standard deviation for the number of goals that Pete kicked per game.
This season, the mean number of goals Pete kicks per game is \(5\), with a standard deviation of \(2.7\). In which season was his performance more consistent?

\(1.414\)
1. \(6\)
2. \(2\)
3. \(1.807\)
1. \(\overline{x}= 5\) and \(s=2.22\)
2. Last year, smaller standard deviation means less variation

Learning Lab homepage

Measures of spread

Range

Interquartile range

Standard deviation

Exercise – calculating measures of spread

Keywords

Ask the Library

Acknowledgement of Country