Precision and Recall

Another instance where accuracy scores can be misleading is in "search" tasks, such as information retrieval, where we are attempting to find documents that are relevant to a particular task. Since the number of irrelevant documents far outweighs the number of relevant documents, the accuracy score for a model that labels every document as irrelevant would be very close to 100%.

It is therefore conventional to employ a different set of measures for search tasks, based on the number of items in each of the four categories shown in Figure 6-3:

• True positives are relevant items that we correctly identified as relevant.

• True negatives are irrelevant items that we correctly identified as irrelevant.

• False positives (or Type I errors) are irrelevant items that we incorrectly identified as relevant.

• False negatives (or Type II errors) are relevant items that we incorrectly identified as irrelevant.

False Positives False Negatives
Document Collection

Figure 6-3. True and false positives and negatives.

Given these four numbers, we can define the following metrics:

• Precision, which indicates how many of the items that we identified were relevant, is TP/(TP+FP).

• Recall, which indicates how many of the relevant items that we identified, is TP/(TP+FN).

• The F-Measure (or F-Score), which combines the precision and recall to give a single score, is defined to be the harmonic mean of the precision and recall (2 x Precision x Recall)/(Precision+Recall).

Was this article helpful?

+2 0

Post a comment