In our Green Belt program we have an exercise where students create a histogram from a set of data. Most students use Excel to create their histogram, and here is the resulting graph:
When submitting their work, students quite often include a statement to the effect that the data is “normally distributed” because it appears to be a bell shaped curve, or because the data is “symmetrical about the mean”. But is this really true?
The histogram does give us a good indication of how the data is distributed. In this case the data is symmetrical around the mean, but it is more peaked in the middle than a perfectly normal distribution would be.
Here is what the histogram looks like with a normal curve superimposed:
We can use a probability plot to determine objectively how well data fit any particular distribution. It is not black and white whether these data are normally distributed or not; rather it is a matter of how well the data fit or do not fit the normal distribution (or any other distribution).
Here is a probability plot of the same data using a 90% confidence level:
The null hypothesis when using a probability plot is that there is no difference between the distribution of the data and the distribution being tested (in this case the normal distribution). The alternate hypothesis is that there is a difference. In hypothesis testing we draw a conclusion by comparing the calculated p value (the probability of rejecting the null hypothesis in error) to a threshold value based on how willing we are to make an error in our decision. The threshold value, alpha, is based on our level of confidence.
In this case the threshold value for 90% confidence is 10% or a probability of .10. The calculated p value is much lower than the threshold value, so we reject the null hypothesis. The probability plot indicates that the data do NOT fit a normal distribution.
The lesson here is that we should be careful to make statements about data based on evidence and facts, not based on appearances or assumptions.
Your comments or questions about this article are welcome, as are suggestions for future articles. Feel free to contact me by email at firstname.lastname@example.org.
About the author: Mr. Roger C. Ellis is an industrial engineer by training and profession. He is a Six Sigma Master Black Belt with over 48 years of business experience in a wide range of fields. Mr. Ellis develops and instructs Six Sigma professional certification courses for Key Performance LLC. For a more detailed biography, please refer to www.keyperformance.com.