### Don’t like the data? Throw it out!

One of the first tasks our Black Belt students are asked to do when completing a project is collect data over time and create control charts for the primary output metric.  The purpose of these control charts is to establish baseline performance, and to determine if the process is stable.

One of my students is working on a project where the time in days to complete a process is the primary output metric.  The requirement is that the process be completed in a maximum time of three days.  She recently submitted control charts with the following comment:  “Initially my data was out of control and needed to be adjusted”.

She characterized 10 of the data points that exceeded seven days as “outliers” and arbitrarily changed the duration of those times to six days.   Her logic in changing the data was “because to the company the important information is if it took more than 3 days…”

I sent her a note explaining that if a process is out of control, it is out of control.  There is no reason to adjust the data to make it look any different than what it really is.  I went on to ask her what criteria she used to conclude that these data points were outliers.

She replied as follows: “The data points that were shown on my initial control charts as “out of control” points were considered outliers.  I believe we can remove outliers if we understand what caused them and they were unique to the normal process.  Instead of changing them to 6 days, I should have just removed them.”

I replied that it is not appropriate to remove or change data points that were generated by special causes.   The correct approach in practice is to eliminate the special causes of variation and then collect new data to re-evaluate the performance of the process.  By remove special causes I do not mean throw out the data points – I mean stop the special causes from happening.  If the special causes are removed and the process is still not meeting the requirement, then a process change is indicated.

The learning point here is that to the greatest extent possible we need to use the data as collected, without and adjustments, to give us a clear picture of what is really going on with the process.