Info

Example 9-1. Example feature-based grammar.

>>> nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg')

# Grammar Productions

# S expansion productions S -> NP[NUM=?n] VP[NUM=?n]

# NP expansion productions NP[NUM=?n] -> N[NUM=?n] NP[NUM=?n] -> PropN[NUM=?n] NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n] NP[NUM=pl] -> N[NUM=pl]

# VP expansion productions

VP[ïENSE=?t, NUM=?n] -> Tv[TENSE=?t, NUM=?n] NP

# Lexical Productions

Det[NUM=sg] -> 'this' | 'every' Det[NUM=pl] -> 'these' | 'all' Det -> 'the' | 'some' | 'several' PropN[NUM=sg]-> 'Kim' | 'Jody' N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child' N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' IV[TENSE=pres, NUM=sg] -> 'disappears' | 'walks' TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes' IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk' TV[TENSE=pres, NUM=pl] -> 'see' | 'like' IV[TENSE=past] -> 'disappeared' | 'walked' TV[TENSE=past] -> 'saw' | 'liked'

Notice that a syntactic category can have more than one feature: for example, V[TENSE=pres, NUM=pl]. In general, we can add as many features as we like.

A final detail about Example 9-1 is the statement %start S. This "directive" tells the parser to take S as the start symbol for the grammar.

In general, when we are trying to develop even a very small grammar, it is convenient to put the productions in a file where they can be edited, tested, and revised. We have saved Example 9-1 as a file named featO.fcfg in the NLTK data distribution. You can make your own copy of this for further experimentation using nltk.data.load().

Feature-based grammars are parsed in NLTK using an Earley chart parser (see Section 9.5 for more information about this) and Example 9-2 illustrates how this is carried out. After tokenizing the input, we import the load_parser function ©, which takes a grammar filename as input and returns a chart parser cp ©. Calling the parser's nbest_parse() method will return a list trees of parse trees; trees will be empty if the grammar fails to parse the input and otherwise will contain one or more parse trees, depending on whether the input is syntactically ambiguous.

Example 9-2. Trace of feature-based chart parser.

>>> tokens = 'Kim likes children'.split() >>> from nltk import load_parser O

>>> cp = load_parser('grammars/book_grammars/feat0.fcfg', trace=2) © >>> trees = cp.nbest_parse(tokens) Kim .like.chil.|

| PropN[NUM='sg'] -> 'Kim' * | NP[NUM='sg'] -> PropN[NUM='sg'] * | S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'} | TV[NUM='sg', TENSE='pres'] -> 'likes' * | VP[NUM=?n, TENSE=?t] -> TV[NUM=?n, TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'}

[---->| S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'pl'}

-> TV[NUM='sg', TENSE='pres'] NP[] * |[==============]| s[] -> NP[NUM='sg'] VP[NUM='sg'] *

The details of the parsing procedure are not that important for present purposes. However, there is an implementation issue which bears on our earlier discussion of grammar size. One possible approach to parsing productions containing feature constraints is to compile out all admissible values of the features in question so that we end up with a large, fully specified CFG along the lines of (8). By contrast, the parser process illustrated in the previous examples works directly with the underspecified productions given by the grammar. Feature values "flow upwards" from lexical entries, and variable values are then associated with those values via bindings (i.e., dictionaries) such as {?n: 'sg', ?t: 'pres'}. As the parser assembles information about the nodes of the tree it is building, these variable bindings are used to instantiate values in these nodes; thus the underspecified VP[NUM=?n, TENSE = ?t] -> TV[NUM=?n, TENSE = ?t] NP[] becomes instantiated as VP[NUM='sg', TENSE='pres' ] -> TV[NUM='sg', TENSE='pres' ] NP[] by looking up the values of ?n and ?t in the bindings.

Finally, we can inspect the resulting parse trees (in this case, a single one).

(NP[NUM='sg'] (PropN[NUM='sg'] Kim)) (VP[NUM='sg', TENSE='pres'] (TV[NUM='sg', TENSE='pres'] likes) (NP[NUM='pl'] (N[NUM='pl'] children))))

Was this article helpful?

0 0

Post a comment