A given set of data has to be analyzed for better understanding. There are certain measures which give the idea of overall distribution of the observations in the data set. These measures are called the descriptive statistics.
Measures of Central Tendency
These measures are also termed as statistical averages or averages. If you take a set of data, then there is a value in the data set around which all the remaining data tends to cluster. That value is the most representative figure of the entire data set. There are three most popular measures of the central tendency, termed as mean, median & mode. Let us discuss each one by one.
It is the most common measure of the central tendency and it is obtained by dividing the total values in the entire data set by total number of items in the data set. The expression for mean is given as
Sometimes the weight of each item in the data set is also given. So, in order to calculate the more realistic average of the data set, Weighted Mean can be calculated as follows
Mean is the simplest and most commonly used measure of central tendency in our daily life for eg., Vineet daily walks average 6000 steps, Peter spends approximately Rs. 25,000/- monthly in travelling, etc.
Remember that is is affected by outliers, therefore sometimes it may not provide the true representation of the central tendency.
Mean sometimes becomes flawed representative of the data set when the it has outliers. In this case, the median is used as a measure of the central tendency. Median divides the data- set into two equal parts. Half of the items are less than the median and the remaining half of the items are larger than then median. You can also say that the median is the middle value of the data set. The data is first arranged either in ascending or descending order, to obtain the median.
The median is determined as follows:
Suppose, there are n number of observations in a data set, then
Let us take an example to understand it further:
Suppose the data set is given like 3,6,8,4,9,2,4,1. You can see that the value of n is 8 which is an even number. Therefore the median of this data set will be 1/2 [4+9]= 6.5
Median is a positional average which is used in qualitative phenomena. It is not frequently used in sampling statistics.
Mode is the most frequently occurring observation in the data- set. Mode has been derived from the French word having the meaning fashion. It is mainly useful in the study of size most in demand so that a manufacturer may produce large quantities of that sized item, for e.g. if a T- shirt of size 40 is most in demand then the manufacturer will plan to make more number of 40 size T- shirts. Like median, this is also a positional average and is not effected by the extreme values. There may be more than one mode in a data set or it may not have any mode at all.
Now we will consider one more example to understand the concept better:
Consider the data set illustrated by the following table:
Applying the above concept, the mean price of airlines tickets is calculated as follows:
Now let us calculate the median of the given data set. Please note down that there are 10 items (even number) in the case of Indigo Airlines while for Air India, the number of items is 9 (odd number).
Median of Indigo Airlines tickets is between 5th position and 6th position, and therefore calculated as (6000+3700)/2= Rs. 4850/-
Median of Air India tickets is the 5th position value as it is the middle value of the data set, and therefore it is Rs. 6200/-
Mode is the most frequent value appearing in the data set, i.e. the number occurring highest times.
By observation, we can see that number 3000 and number 6000 are appearing two times case of Indigo Airlines tickets, and therefore there are two modes for Indigo Airlines, i.e. Rs. 3000/- and Rs. 6000/-. Now if we consider the Air India tickets, we observe that no number is appearing twice in the data set and hence we can conclude that there is no mode in case of Air India.
Note: Remember we earlier discussed that sometimes there may be more than one mode or no mode at all of a given data set.
(1) Mean is the average value of a set of observations.
(2) Median is the middle value of a set of observations.
(3) Mode is the most frequently appearing value in a set of observations.
(4) Median is affected by outliers and therefore sometimes it may not truly represent the central value of the data set.
(5) Mode and median are not affected by the outliers.
(6) There is no best measure among the mean, median & mode and the use of either of the measure depends on the requirements and is decided on the case to case basis.