Training Classifier Based Chunkers

Both the regular expression-based chunkers and the n-gram chunkers decide what chunks to create entirely based on part-of-speech tags. However, sometimes part-of-speech tags are insufficient to determine how a sentence should be chunked. For example, consider the following two statements:

(3) a. Joey/NN sold/VBD the/DT farmer/NN rice/NN ./.

b. Nick/NN broke/VBD my/DT computer/NN monitor/NN ./.

These two sentences have the same part-of-speech tags, yet they are chunked differently. In the first sentence, the farmer and rice are separate chunks, while the corresponding material in the second sentence, the computer monitor, is a single chunk. Clearly, we need to make use of information about the content of the words, in addition to just their part-of-speech tags, if we wish to maximize chunking performance.

One way that we can incorporate information about the content of words is to use a classifier-based tagger to chunk the sentence. Like the n-gram chunker considered in the previous section, this classifier-based chunker will work by assigning IOB tags to the words in a sentence, and then converting those tags to chunks. For the classifier-based tagger itself, we will use the same approach that we used in Section 6.1 to build a part-of-speech tagger.

The basic code for the classifier-based NP chunker is shown in Example 7-5. It consists of two classes. The first class O is almost identical to the ConsecutivePosTagger class from Example 6-5. The only two differences are that it calls a different feature extractor © and that it uses a MaxentClassifier rather than a NaiveBayesClassifier ©. The second class O is basically a wrapper around the tagger class that turns it into a chunker. During training, this second class maps the chunk trees in the training corpus into tag sequences; in the parse() method, it converts the tag sequence provided by the tagger back into a chunk tree.

Example 7-5. Noun phrase chunking with a consecutive classifier. class ConsecutiveNPChunkTagger(nltk.TaggerI): O

def _init_(self, train_sents):

for tagged_sent in train_sents:

untagged_sent = nltk.tag.untag(tagged_sent) history = []

for i, (word, tag) in enumerate(tagged_sent):

featureset = npchunk_features(untagged_sent, i, history) © train_set.append( (featureset, tag) ) history.append(tag) self.classifier = nltk.MaxentClassifier.train( © train_set, algorithm='megam', trace=0)

for i, word in enumerate(sentence):

featureset = npchunk_features(sentence, i, history) tag = self.classifier.classify(featureset) history.append(tag) return zip(sentence, history)

class ConsecutiveNPChunker(nltk.ChunkParserI): O

def _init_(self, train_sents):

tagged_sents = [[((w,t),c) for (w,t,c) in nltk.chunk.tree2conlltags(sent)] for sent in train_sents] self.tagger = ConsecutiveNPChunkTagger(tagged_sents)

def parse(self, sentence):

tagged_sents = self.tagger.tag(sentence) conlltags = [(w,t,c) for ((w,t),c) in tagged_sents] return nltk.chunk.conlltags2tree(conlltags)

The only piece left to fill in is the feature extractor. We begin by defining a simple feature extractor, which just provides the part-of-speech tag of the current token. Using this feature extractor, our classifier-based chunker is very similar to the unigram chunk-er, as is reflected in its performance:

>>> def npchunk_features(sentence, i, history):

>>> chunker = ConsecutiveNPChunker(train_sents)

>>> print chunker.evaluate(test_sents)

ChunkParse score:

IOB Accuracy: 92.9% Precision: 79.9% Recall: 86.7%

We can also add a feature for the previous part-of-speech tag. Adding this feature allows the classifier to model interactions between adjacent tags, and results in a chunker that is closely related to the bigram chunker.

>>> def npchunk_features(sentence, i, history): ... word, pos = sentence[i] ... if i == 0:

... prevword, prevpos = "<START>", "<START>"

... return {"pos": pos, "prevpos": prevpos} >>> chunker = ConsecutiveNPChunker(train_sents) >>> print chunker.evaluate(test_sents) ChunkParse score:

IOB Accuracy: 93.6% Precision: 81.9% Recall: 87.1%

Next, we'll try adding a feature for the current word, since we hypothesized that word content should be useful for chunking. We find that this feature does indeed improve the chunker's performance, by about 1.5 percentage points (which corresponds to about a 10% reduction in the error rate).

>>> def npchunk_features(sentence, i, history): ... word, pos = sentence[i] ... if i == 0:

... prevword, prevpos = "<START>", "<START>"

... return {"pos": pos, "word": word, "prevpos": prevpos}

>>> chunker = ConsecutiveNPChunker(train_sents)

>>> print chunker.evaluate(test_sents)

ChunkParse score:

IOB Accuracy: 94.2% Precision: 83.4% Recall: 88.6%

Finally, we can try extending the feature extractor with a variety of additional features, such as lookahead features O, paired features ©, and complex contextual features ©. This last feature, called tags-since-dt, creates a string describing the set of all part-of-speech tags that have been encountered since the most recent determiner.

>>> def npchunk_features(sentence, i, history): word, pos = sentence[i] if i == 0:

prevword, prevpos = "<START>", "<START>" else:

prevword, prevpos = sentence[i-1] if i == len(sentence)-1:

nextword, nextpos = "<END>", "<END>" else:

nextword, nextpos = sentence[i+1] return {"pos": pos, "word": word, "prevpos": prevpos, "nextpos": nextpos,

"prevpos+pos": "%s+%s" % (prevpos, pos), © "pos+nextpos": "%s+%s" % (pos, nextpos), "tags-since-dt": tags_since_dt(sentence, i)} ©

>>> def tags_since_dt(sentence, i): tags = set()

>>> chunker = ConsecutiveNPChunker(train_sents) >>> print chunker.evaluate(test_sents) ChunkParse score:

IOB Accuracy: 95.9% Precision: 88.3% Recall: 90.7%

Your Turn: Try adding different features to the feature extractor function npchunk_features, and see if you can further improve the performance of the NP chunker.

Was this article helpful?

0 0

Post a comment