Calculating Group Statistics

Finally, I wanted to produce a more detailed report on how many different groups were found and the number of exceptions in each, both relative (as a percentage) and absolute (the total number of occurrences).

I already have all the details in the dictionary, including the group name and total number of exceptions in the group. But the dictionaries are not sorted, and it would be nice to have a list presented in descending order, where the worst "offenders" are at the top.

Python has a very useful built-in function for sorting any iterable objects: sorted (). This function accepts any iterable object such as list or dictionary and returns a new sorted list. The tricky part is that e when iterating through a dictionary, you are only iterating though its keys, so calling sorted() with a dictionary as its parameter, you'd only get a list of sorted keys!

>>> d = {'a': 10, 'b': 5, 'c': 20, 'd': 15} >>> for i in d: ... print i a c b d

Obviously this isn't really what you want—you need both values in your result. Dictionaries have a built-in method that returns key/value pairs as iterable objects—iteritems (). If you use this instead, you'll get a slightly better result, showing both the key and value of each pair, but still sorted on the key value, which isn't what you want either:

[('a', 10), ('b', 5), ('c', 20), ('d', 15)] >>>

The sorted() function accepts an argument that allows you specify a function that will be used to extract a comparison key from the list elements when the elements are composite, such as value pairs. In other words, this function should return a second value from each pair. You need a special function from the operator library: itemgetter(). I will use this function to extract the second value from each pair, and this value will be used by the sorted () function to sort the list:

>>> from operator import itemgetter >>> t = ('a', 20) >>> itemgetter(1)(t) 20

>>> sorted(d.iteritems(), key=itemgetter(1))

[('b', 5), ('a', 10), ('d', 15), ('c', 20)] >>>

And the final touch is telling sorted () to sort the list in reverse order, so that the list starts with the item that has the largest value:

>>> sorted(d.iteritems(), key=itemgetter(1), reverse=True)

[('c', 20), ('d', 15), ('a', 10), ('b', 5)] >>>

Similarly I am generating and printing the list of exception groups. I also add a statistical calculation, just to show the relative size of each group:

for i in sorted(categories.iteritems(), key=operator.itemgetter(1), reverse=True): print "%8s (%6.2f%%) : %s" % (i[1], 100 * float(i[1]) / float(self.count), i[0]

Was this article helpful?

0 0

Post a comment