### Data Segmentation

In Six Sigma we collect and analyze data in order to look for patterns that will help us understand the behavior of a process and where to focus our improvement efforts.  Prior to collecting the data, we need to think about how we want to analyze the data.  If we fail to collect the detail that we need for analysis, our analysis efforts will obviously suffer.

Data segmentation is used to help us think about how to collect data.  Data segmentation is the act of dividing information into groups that are similar in specific ways.  There are four major categories that are typically used for segmenting data; Who, What, Where and When.

To illustrate the application of this approach, we will discuss an example that was submitted by one of my Green Belt students in response to a course assignment. The assignment was to choose a problem that required analysis, and then develop a data segmentation plan prior to the collection of data.  The problem is that employees are not arriving on time for work.  For each instance that an employee arrives late to work, data will be collected for Who, What, Where and When.

The Who segmentation factor will be the class that the employee falls into.  The classifications will be executive, manager, engineer, machinist and assembly line worker.   Once the data is collected in this manner, we can determine if there is a significant difference in late arrivals between the classes of employees.

The What factor in this case is the transportation method.  Data will be collected for each employee regarding how they travel to work.  The choices are car, bus, train, bicycle and walking.  The data will then be analyzed to see if there is a difference in performance based on the method of travel.

The Where factor in this case is physical department within the building.  The departments are front office, main wing, engineering wing, machine shop, and clean room.  We will look for differences in performance from one department to another.

The When factor was identified as either morning shift or evening shift, in order to see of one shift performs better than the other.

By collecting and analyzing data in this manner, we can rule out factors that are not associated with poor performance and zero in on those that do make a difference.

A common problem that students have when completing this assignment is submitting only one classification or subgroup within a major category.  An example in this case would be collecting data for only one type of transportation method.  Another example would be collecting data for only one shift.   We must use two or more classifications or sub-groups within each of the four major categories (Who, What, Where and When) in order for our subsequent analysis to be meaningful.