Photo by Tolga Ulkan on Unsplash
Types of MEAN - ( Measure of Central Tendency )
Simple Equipment for feature engineering
Introduction
To represent a dataset as a 1-number summary, we use the central tendency measure. There exist three central tendency measures i.e. Mean, Median & Mode. Why was there a need for these three measures when only one (Mean) could have done the job?
In this article, we are going to see the different types of Mean and their functionalities.
It's one of the most important concepts of statistics, a crucial subject to learn machine learning.
Colab Notebook
All the code executed in this article can be found in this notebook. Please take a copy and try executing it to have happy learning.
Arithmetic Mean
Logic
It's the mathematical expectation of a discrete set of numbers or averages. The mathematical symbol of Arithmetic mean is,
It's pronounced as "x-bar". Its the sum of all discrete values in the set divided by the total number of values in the set. The formula to calculate the mean of n values is,
Example:
Values of the set = { 2, 6, 7, 5, 5 }
Sum = 25
n, Total Values = 5
Arithmetic Mean = 25/5 = 5
Code:
import statistics
data = [ 2, 6, 7, 5, 5 ]
x = statistics.mean(data)
print("Arithmetic Mean is ", x)
Output:
Trimmed Mean
Logic:
Arithmetic Mean is influenced by the outliers (extreme values) in the data. So, trimmed mean is used at the time of pre-processing when we are handling such kinds of data in machine learning.
Its arithmetic has a variation (i.e) it is calculated by dropping a fixed number of sorted values from each end of the sequence of data given and then calculating the mean (average) of remaining values.
Example:
Values of the set = { 1, 3, 4, 8 }
k => n alpha; alpha = 0.25; k = 40.25 = 1
Remaining values of the set = { 3, 4 } R, denominator value = total values - 2k = 4 - 2(1) = 2
Code:
data = [1, 3, 4, 8]
x = stats.trim_mean(data, 0.25)
print("Trimmed Mean is ", x)
Output:
Weighted Mean
Logic:
Arithmetic Mean or Trimmed Mean is giving equal importance to all the parameters involved. But whenever we are working in machine learning predictions, there is a possibility that some parameter values hold more importance than others, so we assign high weights to the values of such parameters. Also, there can be a chance that our data set a highly variable value of a parameter, so we assign lesser weights to the values of such parameters.
Example:
Values of the set = [ 0, 2, 1, 3]
Weight = [ 1, 0, 1, 1]
Sum ( Weight Values of the set ) = 01 + 20 + 11 + 3*1 = 4
Sum (Weight) = 3
Weighted Mean = 4 / 3 = 1.3333
Code:
data = [0, 2, 1, 3]
x = np.average(data, weights=[1, 0, 1, 1])
print("Weighted Mean is :", x)
Output:
Geometric Mean
Logic:
As arithmetic mean is the sum of all the discrete values in the set, Geometric mean is the product of discrete values in the set. It's useful for the set of positive discrete values.
Example:
Values of the set = { 1, 3, 9 }
product = 139 = 27
n, Total values = 3
Harmonic Mean = (27)^(1/3)
Code:
x = stats.mstats.gmean([1, 3, 9])
print("Geometric Mean is ", x)
Output:
Harmonic Mean
Logic:
Harmonic mean plays it role when it comes to calculating the mean of the terms which are defined in relation to any unit. Its the reciprocal of the mean of the reciprocal of the data. It's used when inverse variation in relation is involved in the data.
Example:
Values of the set : { 1, 3, 9 }
Sum of reciprocals = 1/1 + 1/3 + 1/9
n, Total Values = 3
Harmonic Mean = 3 / (sum of reciprocals)
Code:
x = stats.mstats.hmean([1, 3, 9])
print("Harmonic Mean is ", x)
Output:
Conclusion
In this article, we have seen the various types of mean which could be imbibed into feature selection, feature transformation, feature engineering. This could be used in filling missing values aswell.