### Overdispersion of data

One of my Master Black Belt students sent me an inquiry about a manufacturing process that was producing a very small percentage of scrap.  He was having trouble understanding the control charts that he had created from his data and was convinced that he needed to perform a transformation on the underlying data.

I started a Minitab project using his scrap data and first ran a P chart of the proportion of scrap.  The P chart was chosen because the data represents defectives, i.e. one or more defects on a unit results in a defective unit.  I also selected all of the tests for special cause variation.  Here is the resulting P chart.

Two things are apparent.  One, the process is not at all stable when evaluated using this chart and this set of data.  Two, the proportion of defectives is very small, which was noted previously.

I next ran the P chart diagnostic routine, which helps you decide if the Laney P’ chart may be more appropriate.  I suspected that the Laney P’ would be recommended.  The Laney P’ chart is used if you have large subgroups of data and the data is overdispersed.

Quoting from Minitab Help:  “Overdispersion exists when there is more variation in your data than you would expect based on a binomial distribution (for defectives) or a Poisson distribution (for defects). Traditional P charts and U charts assume that your rate of defectives or defects remains constant over time. However, external noise factors, which are not special causes, normally cause some variation in the rate of defectives or defects over time.

The control limits on a traditional P chart or U chart become (narrower) when your subgroups are larger. If your subgroups are large enough, overdispersion can cause points to appear to be out of control when they are not.”

Minitab includes a diagnostic tool to evaluate whether or not the Laney P’ chart is recommended.  Here is the resulting diagnostic for the proportion of defectives.  The Laney P’ chart was recommended.

Next I ran the Laney P’ chart for the proportion of scrap, again selecting all tests for special cause variation.  Here is the resulting control chart, which shows a quite different picture of the process.

There was clearly trouble with the process around data points 20 to 30, and there was a large outlier at point 54 (over 2000 defectives) which was clearly a special cause.

I next ran a Laney P’ chart for the proportion of scrap for data points 55 to the end of the file, again with all tests for special cause selected.  The resulting chart follows.

This final control chart gives us a much more realistic look at what is going on with the process currently than did the original chart.

Finally, I would not transform this data.  There is no compelling reason to perform a transformation, and doing so would simply confuse the picture of what is really going on with the process.