In statistics, a data sample is collected or selected from a larger population. The population includes all of the data that are of interest. Populations are often so large that it is impossible or impractical to collect data about the entire population. We therefore rely on samples to make inferences or estimates about the population. Samples must be selected in an unbiased manner so that they give the best representation of the population. The best approach to selecting samples in an unbiased fashion is to select them randomly.
We then use the sample data to make an estimate or best guess about some parameter of the population. For example, we might be interested in the average height of male students at a University. There are two types of estimates that we can make from sample data – point estimates and interval estimates.
Point estimates are a single value. We could collect a random sample of male students, measure their height, and then calculate a point estimate of 5 feet 9 inches for average height.
It is quite possible that if we collected a second sample, the point estimate of average male student height would vary somewhat from the first sample. It further stands to reason that the more variation that there is in the height of students in the population, the more one sample mean could vary from the next.
We can use our knowledge of the amount of variability in the population, measured by standard deviation, to construct an interval estimate instead of a point estimate. If we do not know the population standard deviation, we can estimate it from the data sample. An interval estimate gives us a range of values within which the true population parameter is estimated to lie. Confidence intervals are calculated using confidence levels, where the confidence level describes the proportion or percentage of intervals that contain the true value of the population parameter.
Let’s say that we choose a confidence level of 95%. We select a data sample and calculate a confidence interval from that data sample. In the case of male student population, such an interval might be calculated as 5 feet 8 inches to 5 feet 10 inches. If we should subsequently select a number of additional samples, the means of the additional samples would fall within the calculated confidence interval 95% of the time. Note that it is the mean of the additional samples that we are talking about here, not the true population mean.
Unfortunately, confidence intervals are commonly misunderstood. A 95% confidence interval does not mean that for a given interval calculated from sample data that there is a 95% probability that the true population parameter lies within the interval. Once an experiment is done and an interval calculated, this interval either covers the parameter value or it does not. It is no longer a matter of probability. The 95% probability relates to the reliability of the estimation procedure, not to a specific calculated interval.
Your comments or questions about this article are welcome, as are suggestions for future articles. Feel free to contact me by email at firstname.lastname@example.org.
About the author: Mr. Roger C. Ellis is an industrial engineer by training and profession. He is a Six Sigma Master Black Belt with over 45 years of business experience in a wide range of fields. Mr. Ellis develops and instructs Six Sigma professional certification courses for Key Performance LLC. For a more detailed biography, please refer to www.keyperformance.com.