Lining Things Up

So far our formatting strings generated output of arbitrary width on the page (or screen), such as %s and %d. We can specify a width as well, such as %6s, producing a string that is padded to width 6. It is right-justified by default O, but we can include a minus sign to make it left-justified 0. In case we don't know in advance how wide a displayed value should be, the width value can be replaced with a star in the formatting string, then specified using a variable ©.

Other control characters are used for decimal integers and floating-point numbers. Since the percent character % has a special interpretation in formatting strings, we have to precede it with another % to get it in the output. >>> count, total = 3205, 9375

>>> "accuracy for %d words: %2.4f%%" % (total, 100 * count / total) 'accuracy for 9375 words: 34.1867%'

An important use of formatting strings is for tabulating data. Recall that in Section 2.1 we saw data being tabulated from a conditional frequency distribution. Let's perform the tabulation ourselves, exercising full control of headings and column widths, as shown in Example 3-5. Note the clear separation between the language processing work, and the tabulation of results.

Example 3-5. Frequency of modals in different sections of the Brown Corpus.

def tabulate(cfdist, words, categories): print '%-16s' % 'Category', for word in words:

print '%6s' % word, print for category in categories: print '%-16s' % category, for word in words:

>>> from nltk.corpus import brown >>> cfd = nltk.ConditionalFreqDist( ... (genre, word)

... for genre in brown.categories()

... for word in brown.words(categories=genre))

>>> genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor'] >>> modals = ['can', 'could', 'may', 'might', 'must', 'will'] >>> tabulate(cfd, modals, genres)

Category

can

could

may

might

must

will

news

93

86

66

38

50

389

religion

82

59

78

12

54

71

hobbies

268

58

131

22

83

264

science fiction

16

49

4

12

8

16

romance

74

193

11

51

45

43

humor

16

30

8

8

9

13

Recall from the listing in Example 3-1 that we used a formatting string "%*s". This allows us to specify the width of a field using a variable.

>>> '%*s' % (15, "Monty Python") ' Monty Python'

We could use this to automatically customize the column to be just wide enough to accommodate all the words, using width = max(len(w) for w in words). Remember that the comma at the end of print statements adds an extra space, and this is sufficient to prevent the column headings from running into each other.

# column headings

# row heading

# for each word

# print table cell

Was this article helpful?

0 0

Post a comment