Select Page

The term central tendency refers to some values that tend to describe the centre of the complete data set. There are different measures of central tendency. Each of them give us one single number that attempts to summarise the entire data set within itself.

Why do we need measures of central tendency?

Let us assume, a pizza chain has 25 restaurants. A manager at the chain wants to know how did the restaurants perform the previous day. Obviously, one way to do it would be to list each of the 25 restaurants, and the number of pizzas sold at each restaurant the previous day. Cumbersome!

Imagine that the chain had 150 restaurants. The manager would go crazy interpreting the numbers. Drawing any meaningful conclusion would be out of the question.

Now, imagine, if we could give the manager a single indicative number, that would give the manager a good idea about how the restaurants did. More importantly, the single number would be far easier to comprehend and handle.

That single number is a measure of central tendency.

What are the various measures of central tendency?

There are three different measures of central tendency – they are mode, median, and mean.

Mode

Mode is the value that occurs most frequently in the dataset. Look at the set of numbers given below. If you look closely, you’d notice that 33 occurs the most number of times – twice. This value is the mode of this data. The value can also be called the modal value.

central-tendency-mode

What if the dataset was different, like this:

central-tendency-mode-bimodal

Now, there are two values – 33, and 38 – that occur the most number of times, twice each. What becomes the mode now? Both numbers are the modes in this case.

There can be a single mode (unimodal), two modes, (bimodal), three modes (trimodal), or generally, more modes (multimodal).

We can find the modes using Microsoft Excel. Newer versions of MS Excel have two functions – =MODE.SNGL(), and =MODE.MULT() to find the modes.

To find a single mode, we shall use the =MODE.SNGL() function, and select the complete data set in the brackets as the arguments of the function. When we hit Enter to execute the function, we would get the single mode. In this case, the answer would be 33.

mode.sngl-single-mode-excel

To find multiple modes, we shall use the =MODE.MULT() function that has to be entered as an array function in a column. It would return all the modes, arranged vertically. Entering the function as an array function over a few cells vertically, would give the modes that exist. If we place the array formula over more cells than the number of modes, the extra cells would show #N/A.

multiple-mode

How do we enter a function as an array formula? To enter an array formula, first select the required number of cells (in this case, select vertically), enter the formula complete, and then hit Ctrl+Shift+Enter together. The resulting function would appear enclosed in {brackets} in the formula bar to indicate that it is an array formula.

mult-mode-excel-array-formula

You can read more about creating array formulas on Microsoft Office Support site.

Older versions of Excel use =MODE() function which will give you either the single mode, or one of the modes.

Median

The median is the middle value of an ordered data set. This means, finding the median requires the data to first be arranged in either ascending or descending order (the order doesn’t matter), and then finding the value in the middle. Now, if the data set has an odd number of values, it is easy to find one middle value. If the data set has an even number of values, there isn’t one middle value.

ravan-ten-headsThe demon-king Ravan from Indian mythology has ten heads.Think about this, one head is attached to the torso, and that’s the middle head. Are the rest of the heads then attached asymmetrically? Like five on one side, and four on the other? That’s the trouble with median for data sets with even number of values.

Fortunately, statistics has a solution, or shall we say a workaround, for this. We simply take the arithmetic average of the two middle values as the median. Therefore, unlike the mode, there can only be a single median for a data set.

Microsoft Excel has a simple function to calculate the median, and it’ll do the ordering and all that for you. Simply key in =MEDIAN() and select the complete data set within the brackets, and hit enter.

The data that we used above has an odd number of values.

central-tendency-mode-median-mean

The ordered data set would contain – 23, 26, 27, 33, 33, 38, 42, 45, 47 – and the middle value is 33 (it lies right in the middle, and has four values on either side).

median

Mean

The arithmetic mean, or the average is perhaps the most commonly used measure of central tendency. In fact, it’s so common that ‘average’ is part of our daily lexicon, unlike median or mode. You probably know this already, but I’ll just state here anyway: average is calculate by dividing the sum of all the values, by the number of values in the data set.

Microsoft Excel does this with =AVERAGE() function, where the numbers in the set are included as arguments within the brackets.

mean

Choosing which Measure to Use

Having three different measures of central tendency can be confusing. When should we use them? Are there particular types of data that are suited to one over another?

Mode is suited when there are variations in the data, because it isn’t affected by extreme values. Think of it – as long as most of the values remain, a huge change in one of the extreme values would not change the mode at all. Imagine, a company wants to order t-shirts for its staff. Unfortunately, they don’t have enough time to get everyone their correct size, and they can only buy all of them in the same size. What size should the company buy? The average size may not fit anyone well. The median size would be too big for half the employees, and too small for the other half. The modal size would fit the most people. That’s the best option, isn’t it?

The median is a good measure when we’re trying to get an idea about the proportions. For example, when trying to judge whether a company pays handsome salaries, it isn’t useful to use mode because the modal value may be a lower salary level at which many entry-level staff work. The mean isn’t meaningful either, because a few really high/low salaries can skew the average. A helpful measure to use here is the median. The median salary value would tell us that 50% of the employees are below that level. Similarly, when talking about how young or old a country’s population is, the measure used is the median.

The mean is always popular, but is best suited only when all the data points are comparable, and there are no extreme outlier values. We use mean all the time – average winter temperature, average height of a class, average number of phone calls made in a day, average number of emails received, and many more. If there are large variations in the data at hand, you must consider using an additional measure along with the mean.

Can Mean = Median = Mode?

If the data is distributed as a perfect normal distribution, then the mean, median and the mode would all be equal. The normal distribution is also called the bell curve because of its characteristic shape. The highest point of the bell curve has the most number of values, therefore that’s the mode. That point also divides the bell curve into two equal halves, therefore that’s the median. The mean also lies at the same point.