When we work on process improvement, we need to check our results to see if the output of the process was changed for the better. There are two ways to evaluate the results, practical significance and statistical significance. We need to verify that the results are significant in both respects. At the risk of stating the obvious, we first need to evaluate whether the change resulted in improvement. We may make changes thinking that the process output will get better when in fact it gets worse. After we have verified that the results are positive, the test for practical significance is straightforward. We ask if the results are large enough to be meaningful to the customers and the owners of the business. If the answer is yes, then the results are of practical significance.

In a situation where the process output contains a large degree of normal variation, it is possible for the before and after results to appear to be different but come from the same distribution. We test for statistical significance by using a two-sample hypothesis test to compare data from before and after improvement. The specific two sample test used depends in part on the type of data (continuous or attribute) and whether the data is normally distributed.

The null hypothesis is that no improvement has been made, i.e. that the before and after data samples come from the same underlying distribution. This means that the change is not statistically significant. The alternate hypothesis is that the process output has been improved, i.e. that the before and after data samples come from different distributions. This means that the change is statistically significant.

Here are two examples from a case study on process improvement when making telephone sales calls. The two key process output metrics are orders per agent per hour and conversion rate. The underlying data for both is attribute data. The goals of the project are to increase the orders per hour per agent to four and to double the current conversion rate.

In the first case, the calculated metric of orders per hour per agent can take on any value, so it may be treated as continuous data. Before improvement we collected 25 samples and determined that the mean of orders per hour per agent was 2.292 and the standard deviation was .4643. After improvement we selected 23 samples and found that the mean of orders per hour per agent was 4.103 and the standard deviation was .4082. The result exceeded the goal of four orders per hour per agent and therefore is of practical significance.

The data before and after improvement was tested for normality, and in both cases the data is normally distributed. The data was also tested for stability using I-MR control charts, and the process is stable before and after improvement. Finally, the two distributions are independent, so the appropriate hypothesis test is therefore the 2-sample t test. The before and after data was entered in Minitab, Stat, Basic Statistics, 2-sample t, Summarized Data, 95% confidence level, using an assumed difference in means = zero. Minitab returned a p value of 0.000, which is less than the alpha decision level of .05 for 95% confidence. As a result, we reject the null hypothesis of no improvement and conclude that the change is of statistical significance. Also, the 95% confidence interval for the difference in means is (-2.065 to -1.557). This interval does not include zero, again confirming that the difference between the before and after data is statistically significant.

In the second case, we are comparing the number of sales (attribute data) to the number of calls made (also attribute data) to calculate a conversion rate or proportion. Before improvement we made 26931 sales out of 128400 calls for a conversion rate of 20.97%. After improvement we made 13874 sales out of 32527 calls for a conversion rate of 42.65%. The results were very close to the goal of doubling the conversion rate and therefore are of practical significance.

The appropriate hypothesis test here is the 2 Proportions test. The data was entered in Minitab, Stat, Basic Statistics, 2 Proportions, Summarized Data, 95% Confidence Level, examine proportions separately. Minitab returned a p value 0.000, which is less than the alpha decision level of .05 for 95% confidence. As a result, we reject the null hypothesis of no improvement and conclude that the change is of statistical significance. Also, the 95% confidence interval for the difference in proportions is -.0022 to -0.211. This interval does not include zero, again confirming that the difference between the before and after data is statistically significant.

Your comments or questions about this article are welcome, as are suggestions for future articles. Feel free to contact me by email at roger@keyperformance.com.

About the author: Mr. Roger C. Ellis is an industrial engineer by training and profession. He is a Six Sigma Master Black Belt with over 50 years of business experience in a wide range of fields. Mr. Ellis develops and instructs Six Sigma professional certification courses for Key Performance LLC. For a more detailed biography, please refer to www.keyperformance.com.