# The Power of a Statistical Test

When we collect data to test a hypothesis, we must manage the tradeoff between the size of the sample we collect, the minimum size of the difference or effect that the test will detect, and the power of the test.

Say we want to use a hypothesis test to determine if a process has been improved in a statistically significant manner. We plan to collect sample data from the process before and after improvement and use the two groups of sample data for the hypothesis test. The null hypothesis is that there is no difference in performance before and after changes to the process. The alternative hypothesis is that there is a difference in performance before and after changes to the process.

Nothing is guaranteed when we select samples from a population and use those samples to draw conclusions about the population. The estimated effects that come out of a study can be due to either a real effect being present, or random sample error. Our ability to determine if an effect is statistically significant depends on the minimum size of the effect we need to detect, the size of the samples and the variability present in the sample data. These three factors are intertwined.

The larger the minimum difference or effect between two groups of data that we need to detect, the easier it is to detect. Larger sample sizes allow hypothesis tests to detect smaller differences between groups of data. Less variability in the data makes it easier to detect smaller differences between groups of data.

When planning a hypothesis test, we must balance the minimum size of the difference or effect between two groups of data that we need to detect, the cost of collecting data, and the level of confidence that we need in our results (the power of the test) considering the amount of variability present in the population. Does your test need to detect subtle effects or only large shifts? Is it inexpensive or expensive to collect data? How much confidence do we need in our results? Is there a critical need for a high level of confidence (food safety) or is the need for confidence less critical (the dimensions of toothpicks)?

We need a test that has the necessary power for the job. We want to avoid an underpowered study that has a low probability of detecting an important effect. At the same time, we want to avoid spending excessive time and effort collecting data that results in a test so powerful that it will detect an effect that is too small to be of any practical significance.

The measure of the level of confidence or certainty that we have in the results of a hypothesis test is the power of the test. The power of the test is the probability that it will detect an effect if that effect truly exists. The power of a test indicates the probability that an indicated effect is real and not simply the result of random sampling error. If a study has 80% power, it has an 80% chance of detecting an effect that really exists.

As the variability in the population (as measured by standard deviation) decreases, power increases. As the minimum size of the effect that we need to detect increases, power increases. As the sample size increases, power increases.

Using the known population standard deviation as an input, statistical software such as Minitab will calculate any one of the following three variables, given the other two as inputs: Sample size required, minimum difference that will be detected, and the power of the test. If we input the population standard deviation, the minimum size of the effect that we need to detect and our intended sample size, Minitab will return the power of the test. If we input the population standard deviation, the power that we need and the sample size that we intend to collect, Minitab will return the minimum size of the effect that will be detected. If we input the population standard deviation, the power of the test that we need and the minimum size of the effect we need to detect, Minitab will return the required sample size.