Python Language Verbs

Verbs are words that describe events and actions, e.g., fall and eat, as shown in Table 5-3. In the context of a sentence, verbs typically express a relation involving the referents of one or more noun phrases.

Table 5-3. Syntactic patterns involving some verbs Word Simple With modifiers and adjuncts (italicized)

fall Rome fell Dot com stocks suddenly fell like a stone eat Mice eat cheese John ate the pizza with gusto

What are the most common verbs in news text? Let's sort all the verbs by frequency:

>>> wsj = nltk.corpus.treebank.tagged_words(simplify_tags=True) >>> word_tag_fd = nltk.FreqDist(wsj)

>>> [word + "/" + tag for (word, tag) in word_tag_fd if tag.startswith('V')] ['is/V', 'said/VD', 'was/VD', 'are/V', 'be/V', 'has/V', 'have/V', 'says/V', 'were/VD', 'had/VD', 'been/VN', "'s/V", 'do/V', 'say/V', 'make/V', 'did/VD', 'rose/VD', 'does/V', 'expected/VN', 'buy/V', 'take/V', 'get/V', 'sell/V', 'help/V', 'added/VD', 'including/VG', 'according/VG', 'made/VN', 'pay/V', ...]

Note that the items being counted in the frequency distribution are word-tag pairs. Since words and tags are paired, we can treat the word as a condition and the tag as an event, and initialize a conditional frequency distribution with a list of condition-event pairs. This lets us see a frequency-ordered list of tags given a word:

>>> cfd1 = nltk.ConditionalFreqDist(wsj) >>> cfd1['yield'].keys() ['V', 'N']

>>> cfd1['cut'].keys() ['V', 'VD', 'N', 'VN']

We can reverse the order of the pairs, so that the tags are the conditions, and the words are the events. Now we can see likely words for a given tag:

>>> cfd2 = nltk.ConditionalFreqDist((tag, word) for (word, tag) in wsj) >>> cfd2['VN'].keys()

['been', 'expected', 'made', 'compared', 'based', 'priced', 'used', 'sold', 'named', 'designed', 'held', 'fined', 'taken', 'paid', 'traded', 'said', ...]

To clarify the distinction between VD (past tense) and VN (past participle), let's find words that can be both VD and VN, and see some surrounding text:

>>> [w for w in cfdl.conditions() if 'VD' in cfdl[w] and 'VN' in cfdl[w]] ['Asked', 'accelerated', 'accepted', 'accused', 'acquired', 'added', 'adopted', ...] >>> idxl = wsj.index(('kicked', 'VD')) >>> wsj[idx1-4:idx1+l]

[('While', 'P'), ('program', 'N'), ('trades', 'N'), ('swiftly', 'ADV'), ('kicked', 'VD')]

>>> idx2 = wsj.index(('kicked', 'VN')) >>> wsj[idx2-4:idx2+l]

[('head', 'N'), ('of', 'P'), ('state', 'N'), ('has', 'V'), ('kicked', 'VN')]

In this case, we see that the past participle of kicked is preceded by a form of the auxiliary verb have. Is this generally true?

Your Turn: Given the list of past participles specified by cfd2['VN'].keys(), try to collect a list of all the word-tag pairs that immediately precede items in that list.

Was this article helpful?

0 0

Post a comment