Our Six Sigma courses include a number of lessons on hypothesis testing. The most common problems that I see with submissions for these assignments are related to how the null and alternate hypotheses are stated and how the results of the hypothesis test are expressed.

The null and alternate hypotheses must be mutually exclusive, and must contain all of the possibilities. The null hypothesis should include a null statement such as no difference, no change, or no improvement. The alternate hypothesis should state that there was a difference, a change or an improvement.

The example that most students can relate to is that of a courtroom trial. In a criminal trial, the burden of proof is on the prosecution to prove that a crime was committed. There is no such burden for the defendant to prove that they did not commit a crime. The null hypothesis in the courtroom is that the defendant is not guilty of the crime that they are charged with committing, i.e. the null condition. The alternate hypothesis is that the defendant is guilty of the crime as charged.

For a process improvement project, the burden of proof is on the project team to show that improvement was really achieved. We might state the null and alternate hypotheses as follows:

Null hypothesis: There was no improvement to the output of the process.

Alternate hypothesis: There was improvement in the output of the process.

We test by formulating the null and the alternate hypotheses, collecting data (or evidence) and then calculating how probable it is that the data would be collected if the null hypothesis were true. If the data or evidence appears improbable then the null hypothesis is probably not true and it is rejected. Otherwise, we must fail to reject the null hypothesis. The correct way to state the outcome of a hypothesis test is that we either reject the null hypothesis, or we fail to reject the null hypothesis. We do not accept the null hypothesis, nor do we accept the alternate hypothesis.

In the courtroom, the null hypothesis is not guilty. We weigh the evidence in the case and then ask ourselves how likely it is that the defendant is not guilty. If not guilty seems unlikely then we return a verdict of guilty. In other words, if we feel that there is guilt beyond a reasonable doubt, we reject the null hypothesis of not guilty and return a verdict of guilty.

If we feel that there is some probability that the null hypothesis of not guilty is true then we fail to reject the null hypothesis due to insufficient evidence and the conclusion is not guilty. Please note that the verdict is never returned as innocent, as we do not know for certain that the defendant is innocent. The verdict of not guilty is based on the fact that there was not enough proof, i.e. proof beyond a reasonable doubt, to declare the defendant guilty.

There are two types of errors that we can make in hypothesis testing. The first is to reject the null hypotheses when it is true. In the courtroom the null hypothesis is not guilty. If the defendant is truly not guilty and we reject the null hypothesis of not guilty, we return a guilty verdict. The result is that an innocent person is convicted. This is a grievous error, and one that we want to avoid. In practice we want the reasonable doubt that we are making the correct decision to be small. A level of five percent doubt, or 95% confidence that we are making the correct decision, is a common standard in hypothesis testing. The more confident we want to be in our decision, the more facts or evidence that we need to support our conclusion.

The second type of error that we can make is to fail to reject the null hypothesis when it is false. Again in the courtroom the null hypothesis is not guilty. If the null is in truth false and the defendant is really guilty, and we fail to reject the null hypothesis, we return a verdict of not guilty. The result is that a guilty defendant goes free.

In the example of the process improvement project, to test the null hypothesis we collect data about the output of the process before and after improvement and we calculate the probability that the data after improvement is the same as before improvement using confidence intervals. If this probability turns out to be small, i.e. smaller than the level of reasonable doubt we are willing to accept (usually five percent) then we reject the null hypothesis of no improvement and conclude that there was statistically significant improvement. Otherwise, we fail to reject the null hypothesis and conclude that there was no statistically significant improvement.

As a final note on the process improvement example, improvement must be of practical significance as well as of statistical significance. It is possible for improvement to be statistically significant but small enough that it is of no practical value.

Your comments or questions about this article are welcome, as are suggestions for future articles. Feel free to contact me by email at roger@keyperformance.com.

About the author: Mr. Roger C. Ellis is an industrial engineer by training and profession. He is a Six Sigma Master Black Belt with over 45 years of business experience in a wide range of fields. Mr. Ellis develops and instructs Six Sigma professional certification courses for Key Performance LLC. For a more detailed biography, please refer to www.keyperformance.com.