# The Central Limit Theorem

Although the central limit theorem can seem abstract and devoid of any practical application, this theorem is actually quite important to the practice of statistics.  As we will see, this theorem allows us to make some assumptions about a population.

In order to understand the basis for the Central Limit Theorem, consider a population of items.  We are interested in knowing the mean or average value of the population.  We begin with a sample taken at random from a population of interest.  From this sample we can calculate a sample statistic that is an estimate of the population statistic.

For example, we are interested in knowing the mean (i.e. average) height of male students who are enrolled in a large university.  The population of interest in this case is all male students.  We select a sample of 10 students and measure their height.  We can use this data to calculate the mean height for those ten students.  However, we also know that the mean calculated from a single sample of ten students may or may not accurately represent the true mean of the population.

Let us further assume that we take a number of additional random samples of the same size in an independent fashion from the same population, and then compute the sample mean for each of these additional samples.

For example, we take a second random sample of 10 male students from the population of university students, measure their heights, and calculate a mean height for this second sample.  We repeat the process many more times, say a total of 30 times, each time calculating a new sample mean.  It is to be expected that each of these 30 sample mean values will be slightly different than the others.

The central limit theorem concerns the distribution of the resulting sample statistics, in the case the distribution of the 30 sample means.  The central limit theorem says that the distribution of these sample means is approximately normal, i.e. it approximates the normal bell shaped curve, as long as all samples have the same size and there are at least 30 different samples.   In our example, if we plot the 30 sample means we will see that their distribution resembles the normal bell shaped curve.

The astonishing fact is that the Central Limit Theorem says that a normal distribution of the values calculated from the samples arises regardless of the underlying distribution in the population.  The underlying distribution may be uniform, exponential, skewed in some fashion, or of any other shape.  The distribution of the sample results will be normal, given a sufficient number of samples.  And the Central Limit Theorem doesn’t just apply to the sample mean; it also holds true for other sample statistics, such as the sample proportion.

One important application of the central limit theorem is in control charts such as X bar and R charts and X bar and S charts that rely on statistics calculated from subgroups of continuous data.  A normal distribution for the variable being monitored is an underlying assumption for these charts.  The Central Limit Theorem allows us to use these charts regardless of the underlying distribution of the population data, because the sample statistics that are being charted are normally distributed.

Your comments or questions about this article are welcome, as are suggestions for future articles.  Feel free to contact me by email at roger@keyperformance.com.

About the author:  Mr. Roger C. Ellis is an industrial engineer by training and profession.  He is a Six Sigma Master Black Belt with over 48 years of business experience in a wide range of fields.  Mr. Ellis develops and instructs Six Sigma professional certification courses for Key Performance LLC.   For a more detailed biography, please refer to www.keyperformance.com.