Data analysis is growing by leaps and bounds. More and more, business analysts, project managers, and Six Sigma practitioners have access to more data. The key is how to derive useful information from the raw data. That is where data analytics comes in.
TRUISM! Raw data is not fit for human consumption, but the information we derive from the data is.
Data analytics can be grouped into three sexy tiers, by level of complexity. The first and simplest tier is Descriptive Analytics. The second tier is Predictive (Inferential) Analytics, and the third and most recent tier is the Prescriptive Analytics.
Now for a surprise. The first two tiers of analytics is covered by Statistics. Just like the old and boring Statistical Process Control was rebranded to the sexy and now famous “Six Sigma”, Statistics has been rebranded as “Descriptive and Predictive Analytics”.
If you are or want to become a lean manufacturing practitioner, be happy; no statistics is need. If you are a project manager, you need to brush up on your statistics, if you want to improve on your estimating techniques for activity durations, resources, and costs, as well as for risk analysis. If you want to be a Six Sigma practitioner, then Statistics is a must.
Statistics comes from the word “states”, i.e. countries, that counted people and resources for taxing and military recruitment. That is why the word “population” is still used in Statistics to refer to the data set.
Statistics is a powerful tool for analyzing data, either for business, science and medicine. Yet, Statistics is more feared by people than even Calculus. It doesn’t have to be that way. Follow me as I will guide you “down the rabbit hole”, (a metaphor for an entry into the unknown, the disorientating or the mentally deranging, from its use in Alice’s Adventures in Wonderland.)
Statistics is simply define as the collection, analysis, and presentation of data sets. Collection refers to measuring or gathering the data as well as tabulating the data (entering it into Excel). Analysis refers to using the appropriate techniques to study the data set (like mean, standard deviation, etc.) and presentation means creating some Excel charts to visually display the findings. Data sets are composed of the same measurement done over an entire population or sample. Examples of data sets can range from class test scores to lifespan of tires.
Data sets come in two flavors: population and sample. A population data set has the entire collection of data. As an example, if there are 20 students who took the exam, and you have all 20 test scores then you have a population dataset. It is straight forward to calculate the class average. If, however, you just have 5 test scores out of 20, then you just have a sample data set. It is much harder and less precise to calculate or predict the class average from just 5 test scores.
Getting a data set for the entire population is difficult. Imagine asking the age of all 300 million Americans! It is much easier just asking 500 Americans their age. With Statistics, you can calculate the ages of 300 million Americans from just 500. That is the power of Data Analytics or Statistics (how it used to be called).
In this series, we will explore Statistics in a simple and refreshing manner. Stay tuned!