Verbs are words that describe events and actions, e.g., fall and eat, as shown in Table 5-3. In the context of a sentence, verbs typically express a relation involving the referents of one or more noun phrases.
Table 5-3. Syntactic patterns involving some verbs Word Simple With modifiers and adjuncts (italicized)
fall Rome fell Dot com stocks suddenly fell like a stone eat Mice eat cheese John ate the pizza with gusto
What are the most common verbs in news text? Let's sort all the verbs by frequency:
>>> wsj = nltk.corpus.treebank.tagged_words(simplify_tags=True) >>> word_tag_fd = nltk.FreqDist(wsj)
>>> [word + "/" + tag for (word, tag) in word_tag_fd if tag.startswith('V')] ['is/V', 'said/VD', 'was/VD', 'are/V', 'be/V', 'has/V', 'have/V', 'says/V', 'were/VD', 'had/VD', 'been/VN', "'s/V", 'do/V', 'say/V', 'make/V', 'did/VD', 'rose/VD', 'does/V', 'expected/VN', 'buy/V', 'take/V', 'get/V', 'sell/V', 'help/V', 'added/VD', 'including/VG', 'according/VG', 'made/VN', 'pay/V', ...]
Note that the items being counted in the frequency distribution are word-tag pairs. Since words and tags are paired, we can treat the word as a condition and the tag as an event, and initialize a conditional frequency distribution with a list of condition-event pairs. This lets us see a frequency-ordered list of tags given a word:
>>> cfd1 = nltk.ConditionalFreqDist(wsj) >>> cfd1['yield'].keys() ['V', 'N']
>>> cfd1['cut'].keys() ['V', 'VD', 'N', 'VN']
We can reverse the order of the pairs, so that the tags are the conditions, and the words are the events. Now we can see likely words for a given tag:
>>> cfd2 = nltk.ConditionalFreqDist((tag, word) for (word, tag) in wsj) >>> cfd2['VN'].keys()
['been', 'expected', 'made', 'compared', 'based', 'priced', 'used', 'sold', 'named', 'designed', 'held', 'fined', 'taken', 'paid', 'traded', 'said', ...]
To clarify the distinction between VD (past tense) and VN (past participle), let's find words that can be both VD and VN, and see some surrounding text:
>>> [w for w in cfdl.conditions() if 'VD' in cfdl[w] and 'VN' in cfdl[w]] ['Asked', 'accelerated', 'accepted', 'accused', 'acquired', 'added', 'adopted', ...] >>> idxl = wsj.index(('kicked', 'VD')) >>> wsj[idx1-4:idx1+l]
[('While', 'P'), ('program', 'N'), ('trades', 'N'), ('swiftly', 'ADV'), ('kicked', 'VD')]
>>> idx2 = wsj.index(('kicked', 'VN')) >>> wsj[idx2-4:idx2+l]
[('head', 'N'), ('of', 'P'), ('state', 'N'), ('has', 'V'), ('kicked', 'VN')]
In this case, we see that the past participle of kicked is preceded by a form of the auxiliary verb have. Is this generally true?
Your Turn: Given the list of past participles specified by cfd2['VN'].keys(), try to collect a list of all the word-tag pairs that immediately precede items in that list.
Was this article helpful?