Problems and Issues with Control Charts

Statistical process control charts, aka Shewhart control charts, are introduced in our Green Belt course.  At the Black Belt level, students are tasked to develop control charts for the primary Y variable that is to be the focus of improvement during the Define phase of their Black Belt project.

Before we can rely on the information generated from data, we need to know that the data come from a stable process, i.e. a process that is in a state of statistical control.   You may wish to review article four in this series entitled Process Capability and Process Control if you need a refresher on these topics.

As a brief review, control charts are used to monitor the output of a process over time, and to determine whether or not the process is stable and exhibiting common cause variation, or unstable and exhibiting special cause variation.  If special cause variation is detected, the special causes should be eliminated and the output of the process reevaluated before considering any fundamental changes to the process.  If common cause variation is present and the output of the process is not acceptable, then changes to the process will be required in order to improve performance.

Selecting the correct type of control chart for the situation at hand and collecting data in the proper manner to create these control charts is one of the biggest challenges for most of my Black Belt students.

The first stumbling block for most students is failing to recognize that data must be gathered over time, and must be ordered by date (oldest date first) in order to properly create control charts.  In addition, the time interval between each subgroup of data should be as equal as possible.  For example: data from each hour, or each day, or each week, etc.

The second issue is the frequency of data collection.  The more frequently data is collected, and the shorter the time horizon, the more micro the resulting view of the process.  The less frequently data is collected, and the longer the time horizon, the more macro the view.  One student recently collected data over just one day and used the data to create control charts.  These control charts were of no use in evaluating the performance of the process over time.

Third, prior to collecting data, stratification should be considered.  The idea behind stratification is to subdivide the overall population of data into two or more groups based on some factor that may influence the data.  One of my recent students had two basic types of customers requesting medical appointments – those that have an emergency and those that do not.  He was concerned about the wait time for customers to get an appointment.  The data were not stratified, i.e. the data from both types of customers were mixed together.  The resulting analysis did not allow us to see if there are any differences in wait time between the two types of customers.

A fourth issue is the number of subgroups of data to be collected and analyzed.  We need 25 or more subgroups, arranged in date order, to generate meaningful control charts.  So we will need data from a minimum of 25 time intervals – i.e. 25 days, or 25 weeks, or 25 months.  Students often fail to recognize this point and will create control charts with as few as four or five data points.

Fifth, when selecting sample data from a larger population of items in order to create control charts, the manner of selection should be random and representative of the larger population.   Going back to our prior example of wait times for medical appointments, failing to select customers randomly has the potential to introduce bias into our sample.  For example, if we deliberately selected the 10 shortest wait times from a given week, a large amount of bias will be introduced.  Second, samples (subgroups) should be homogeneous, i.e. they should be selected over a short period of time, so that we accurately capture what is going on in the process at a given point in time.