Data Analytics, Part 2

#1In the previous article, we established that data sets come in two varieties: Population and Samples. We also established that the data sets are uniform, i.e. one single metric is used, like height or color.

Now, we want to establish that Statistics focuses on finding 2 things, and 2 things only: what is the average of the data, and how much is the data spread out. The first is called “Central Tendency” and the latter is called “dispersion” or “spread” or “variability”. Note that reducing variability is the goal of Six Sigma, and that is why it uses Statistics.

Central tendency is most commonly measured by 3 parameters: mean, median, and mode. Dispersion is most commonly measured by: Range, Variance, and Standard Deviation.

There is another part of Statistics that is critical. All data sets are organized into a table called the “frequency distribution”. The table is composed of one column which has the scale of the dataset and a second column with has the count of the number of times that data occurs. The frequency distribution lends itself to graphing using a bar chart. A pattern emerges which is marked by the blue line.#2

This curved line that has the shape of a mountain is informally known as the “bell curve”, because of its similarity to a church bell.

We have finished describing the basics of Statistics.

The remainder of Statistics is about analyzing the frequency distribution for central tendency and dispersion for either Population or Sample data sets. As you can imagine, analyzing Population data sets is much easier than analyzing Sample data sets. Unfortunately, Population data sets are rare. Sample data sets are the norm. So learn to live with it. In the next article, we will focus on analyzing a Population data for central tendency and dispersion.