Recursion in Linguistic Structure Building Nested Structure with Cascaded Chunkers

So far, our chunk structures have been relatively flat. Trees consist of tagged tokens, optionally grouped under a chunk node such as NP. However, it is possible to build chunk structures of arbitrary depth, simply by creating a multistage chunk grammar containing recursive rules. Example 7-6 has patterns for noun phrases, prepositional phrases, verb phrases, and sentences. This is a four-stage chunk grammar, and can be used to create structures having a depth of at most four.

Example 7-6. A chunker that handles NP, PP, VP, and S. grammar = r

NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN

PP: {<IN><NP>} # Chunk prepositions followed by NP

VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments CLAUSE: {<NP><VP>} # Chunk NP, VP

cp = nltk.RegexpParser(grammar)

sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]

Unfortunately this result misses the VP headed by saw. It has other shortcomings, too. Let's see what happens when we apply this chunker to a sentence having deeper nesting. Notice that it fails to identify the VP chunk starting at O.

>>> sentence = [("John", "NNP"), ("thinks", "VBZ"), ("Mary", "NN"), ... ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), ("sit", "VB"), ... ("on", "IN"), ("the", "DT"), ("mat", "NN")] >>> print cp.parse(sentence) (S

(NP John/NNP) thinks/VBZ (NP Mary/NN) saw/VBD O (CLAUSE (NP the/DT cat/NN)

The solution to these problems is to get the chunker to loop over its patterns: after trying all of them, it repeats the process. We add an optional second argument loop to specify the number of times the set of patterns should be run:

>>> cp = nltk.RegexpParser(grammar, loop=2) >>> print cp.parse(sentence) (S

(NP John/NNP) thinks/VBZ (CLAUSE (NP Mary/NN) (VP saw/VBD (CLAUSE

This cascading process enables us to create deep structures. However, creating and debugging a cascade is difficult, and there comes a point where it is more effective to do full parsing (see Chapter 8). Also, the cascading process can only produce trees of fixed depth (no deeper than the number of stages in the cascade), and this is insufficient for complete syntactic analysis.

Was this article helpful?

0 0

Post a comment