# 68. Evaluating Attribute Inspectors Using Attribute Agreement Analysis

We typically think about the reliability and reproducibility of a gage in the context of measuring some characteristic using continuous (or variable) data.  Gage R&R studies conducted on this type of data are an application of the Analysis of Variance (ANOVA) method.  We evaluate how much variation is introduced by the people using the gage and by the gage itself.  We want a measurement system that has a low percentage of the overall variation consumed by gage error (repeatability) and operator error (reproducibility), and a large percentage devoted to detecting part to part or event to event variation.   In situations where we are measuring attributes rather than continuous data, we still need a way to evaluate how well the attribute inspection system is performing.  The technique that we use in this case is Attribute Agreement Analysis.  The objective of Attribute Agreement Analysis is to determine how consistent the inspectors are with each other and how consistent they are in correctly identifying the attributes.

First we must decide the true state of the attribute for all of the objects to be measured.  There are three approaches that we can use.  The first is Expert Judgment, where an expert looks at the results of an operator and decides which results are correct and which are incorrect.  The second is a Round Robin Study, where a set of objects is chosen that represents a full range of the attributes.  Each item is evaluated by an expert and its condition recorded.  Each item is then evaluated by each inspector at least twice.   The third method is an Inspector Concurrence Study, where a set of objects is chosen that represents the full range of attributes, and each item is evaluated by every inspector at least twice.

Let’s look at an example of an Attribute Agreement Analysis.  For the purposes of this study, a set of 30 objects were identified and each was classified as good or bad by an expert.   A Round Robin Study was then conducted, where two inspectors evaluated each of the object two times, and recorded the object as being either god or bad.  Here is the resulting analysis of the data using Minitab.

The graph on the left shows the 95% confidence interval for each appraiser, i.e. how often we can expect the appraiser to agree with himself when making multiple evaluations of the same object.  What we see here is that each appraiser is fairly consistent in his or her decisions.

The graph on the right shows the 95% confidence interval for agreement between the appraiser and the standard.  In other words, how often does the inspector make the correct decision?  Here we see a different story.  The inspectors do not do a very good job of making the correct decision, indicating that their approach needs to be modified in order to improve their decision making.  Inspector 1 is consistent in his decisions but only makes the correct decision about half the time.  Inspector 2 is consistent in his decisions, but he consistently makes the wrong decision around 80% of the time.