In article 73 in this series, I discussed the issue of whether to transform data. Students of Six Sigma often think (incorrectly) that there is something inherently wrong with their data if it is not normally distributed. We should always strive to understand why the data is distributed the way it is and recognize that not all processes produce outputs that are normally distributed.
In our Master Black Belt course, I included an assignment on data transformation with the intent of making sure that students understand how to perform data transformations, as well as understanding the issues that surround data transformation.
One of my Master Black Belt students recently submitted an assignment where she explained that she attempted to transform actual measurement data from one of her manufacturing projects. She used a probability plot to test the data for normality. The null hypothesis is no difference between the data and a normal distribution. The alternate hypothesis is that there is a difference between the data and a normal distribution. The data proved to be not normally distributed – the null hypothesis was rejected based on a p value that was smaller than .005.
The first line of defense when evaluating data is to make sure that there are no errors in collection of the data or how the data was transcribed. There were no obvious errors in the data.
She then tried a Box Cox transformation on the
data. This transformation routine failed
to transform the data into a normal distribution. Finally, she tried the Johnson
transformation, which also failed to transform the data into a normal
I replied as follows: This is a very interesting data set. I ran the Individual Distribution Identification routine under Stat – Quality Tools in Minitab. The data as collected do not fit ANY of the available distributions. In addition, as you determined, the Box Cox transformation and the Johnson transformation both failed to transform the data.
The Box-Cox transformation does not guarantee normality. It does not check for normality – rather it attempts to produce the smallest standard deviation in the data. Transformed data is likely to be distributed normally when the standard deviation is minimized, but there is no guarantee. Likewise, there is no guarantee that the algorithms in the Johnson transformation will produce a normal distribution.
I then created a histogram of the untransformed data. The data appear to be bimodal – i.e. they appear to be from two different distributions, such as two different machines or two different shifts. This could very well explain why the data cannot be successfully transformed into a normal distribution.
In practice, I always encourage students to proceed with caution. When instructing Black Belt students, I tell them that they should only transform data when there is a good reason to do so, such as a test that cannot otherwise be accomplished. Students often transform data to perform a test that requires normal data, instead of using an available test for non-normal data. Process capability analysis is a good example.
Your comments or questions about this article are welcome, as are suggestions for future articles. Feel free to contact me by email at firstname.lastname@example.org.
About the author: Mr. Roger C. Ellis is an industrial engineer by training and profession. He is a Six Sigma Master Black Belt with over 50 years of business experience in a wide range of fields. Mr. Ellis develops and instructs Six Sigma professional certification courses for Key Performance LLC. For a more detailed biography, please refer to www.keyperformance.com.