Ever wonder where Gaussian distribution is used in a real world?
This article is about the simple example where you can use Gaussian distribution in the real world scenario.
Consider or imagine as you are working in a company named XYZ. You are working as a data analyst or as a data scientist or even as a HR manager and your company decides to make their own brand of t-shirt with its company logo for all the employees . Let’s say your company has about 100k employees working. There are multiple size of t-shirts(S, M, L ,XL) .
And Here the question you want to know is ” How may of your employees wear XL sized t-shirt ?”.
The first task you do is to collect the data. But imagine , can you collect a data from all 100k employees???. Naah… it’s not possible and its too time consuming as well as too costly. But, there is a couple of solution you can try about. As for a t-shirt, one can easily think that if we have a height of a certain person we can easily determine which size of a t-shirt will fit the person. This is a basic domain knowledge which everyone can think of.
As we can collect the heights of 500 random employee easily . And from that measurement we can easily evaluate the Mean and Standard Deviation from the data which are the parameters of the Gaussian Distribution.
As we know that heights are distributed normally.
heights ~ N(mean, S.D)
if it is a Gaussian distribution that we can construct the Cumulative distribution function (C.D.F) and assume that will look like figure below because CDF ranges from 0 to 1.
lets say that people with heights more that 200 cm uses XL size t-shirt.
So , for the figure above , we can say that about (100–95) 5% of your employees use XL.
You can figure out so much information for this CDF which play one of the important in Exploratory data analysis.
Gaussian Distribution is a theoretical model that is observed in many natural phenomena. As for your knowledge,we can test whether if a height is Gaussian Distribution or not by using Q-Q plot which is also referred as Quantile -Quantile plot but if heights are not normally distributed or not a Gaussian distribution then we cannot use C.D.F.
To install the python libraries of the Gaussian Distribution you can use :
pip install Bibekdistribution