# To Transform Data, or Not to Transform, That is the Question!

In a previous article in this series I discussed whether or not to transform data that is non-normally distributed in order to make the data normally distributed.  In that article I stated the following:

“Data that follow a distribution that is either not symmetric or that is symmetric but not bell shaped are said to be non-normal.   A common misconception among students is that all data should be normally distributed, and that there is something inherently wrong if their data is not normally distributed.  In fact, data is not always normally distributed and we should not expect to always see normal distributions.  Non-normality is a fact of life.  Nature does not produce perfectly normal distributions.”

Recently one of my Black Belt students submitted the following Process Capability Analysis for Normally Distributed data, which was generated using Minitab.  The analysis shows that the process is not at all capable of meeting the specification, with a Cpk value of -.10.   The process variable being analyzed is the amount of time needed to complete a customer contact, with a target of five minutes or less.

The control charts show that the data come from a stable process (which is desirable), but I pointed out that one of the assumptions for using this analysis is that the data are normally distributed.  In this case the histogram clearly shows that the data are skewed to the right, and the normal probability plot indicates that the data are NOT normally distributed.  I explained to the student that he should have performed his capability analysis using the option in Minitab for Non-Normal Data.    I also pointed out that his lower specification limit should have been zero and not two minutes.

I used Minitab to determine that the distribution that best fits the data is the Largest Extreme Value Distribution.  I then performed the capability analysis for Non-Normal Data using the Largest Extreme Value Distribution in Minitab and got the following result.  This analysis shows that the process is not at all capable of meeting the requirement.

The student subsequently stated that he had tried the Non-Normal Capability Analysis but he got results similar to when he used the Normal Capability Analysis, i.e. the process was not capable.  He referred to the fact that the process was not capable as a “failure”, and he went on to transform his data using a Johnson Transformation.  He submitted the following Normal Capability Analysis for the transformed data.  The student was quite pleased that the Cpk value was now 1.69 and the process looked to be highly capable of meeting the requirement.

I sent the student the following note in reply:

“The non-normal capability analysis indicates that the process is in fact not at capable.  That is obvious when you look at the distribution curve as compared to the spec limits.

The Johnson Transformation that you performed took the actual data that shows the process is not at all capable and artificially distorted it into data that shows the process is highly capable.  Why would you ever want to do this?  The transformed data does not represent reality in any way whatsoever.

The fact that the process is not capable is not a failure as you suggested – it is simply the true state of the process.”