Exercises

1. 0 Can you come up with grammatical sentences that probably have never been uttered before? (Take turns with a partner.) What does this tell you about human language?

2. 0 Recall Strunk and White's prohibition against using a sentence-initial however to mean "although." Do a web search for however used at the start of the sentence. How widely used is this construction?

3. 0 Consider the sentence Kim arrived or Dana left and everyone cheered. Write down the parenthesized forms to show the relative scope of and and or. Generate tree structures corresponding to both of these interpretations.

4. o The Tree class implements a variety of other useful methods. See the Tree help documentation for more details (i.e., import the Tree class and then type help(Tree)).

5. o In this exercise you will manually construct some parse trees.

a. Write code to produce two trees, one for each reading of the phrase old men and women.

b. Encode any of the trees presented in this chapter as a labeled bracketing, and use nltk.Tree() to check that it is well-formed. Now use draw() to display the tree.

c. As in (a), draw a tree for The woman saw a man last Thursday.

6. 0 Write a recursive function to traverse a tree and return the depth of the tree, such that a tree with a single node would have depth zero. (Hint: the depth of a subtree is the maximum depth of its children, plus one.)

7. o Analyze the A.A. Milne sentence about Piglet, by underlining all of the sentences it contains then replacing these with S (e.g., the first sentence becomes S when S). Draw a tree structure for this "compressed" sentence. What are the main syntactic constructions used for building such a long sentence?

8. 0 In the recursive descent parser demo, experiment with changing the sentence to be parsed by selecting Edit Text in the Edit menu.

9. o Can the grammar in grammar1 (Example 8-1) be used to describe sentences that are more than 20 words in length?

10. o Use the graphical chart-parser interface to experiment with different rule invocation strategies. Come up with your own strategy that you can execute manually using the graphical interface. Describe the steps, and report any efficiency improvements it has (e.g., in terms of the size of the resulting chart). Do these improvements depend on the structure of the grammar? What do you think of the prospects for significant performance boosts from cleverer rule invocation strategies?

11. o With pen and paper, manually trace the execution of a recursive descent parser and a shift-reduce parser, for a CFG you have already seen, or one of your own devising.

12. o We have seen that a chart parser adds but never removes edges from a chart. Why?

13. o Consider the sequence of words: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. This is a grammatically correct sentence, as explained at http://en .wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buf falo. Consider the tree diagram presented on this Wikipedia page, and write down a suitable grammar. Normalize case to lowercase, to simulate the problem that a listener has when hearing this sentence. Can you find other parses for this sentence? How does the number of parse trees grow as the sentence gets longer? (More examples of these sentences can be found at http://en.wikipedia.org/wiki/List_of_ho mophonous_phrases.)

14. ® You can modify the grammar in the recursive descent parser demo by selecting Edit Grammar in the Edit menu. Change the first expansion production, namely

NP -> Det N PP, to NP -> NP PP. Using the Step button, try to build a parse tree. What happens?

15. ® Extend the grammar in grammar2 with productions that expand prepositions as intransitive, transitive, and requiring a PP complement. Based on these productions, use the method of the preceding exercise to draw a tree for the sentence Lee ran away home.

16. ® Pick some common verbs and complete the following tasks:

a. Write a program to find those verbs in the PP Attachment Corpus nltk.cor pus.ppattach. Find any cases where the same verb exhibits two different attachments, but where the first noun, or second noun, or preposition stays unchanged (as we saw in our discussion of syntactic ambiguity in Section 8.2).

b. Devise CFG grammar productions to cover some of these cases.

17. ® Write a program to compare the efficiency of a top-down chart parser compared with a recursive descent parser (Section 8.4). Use the same grammar and input sentences for both. Compare their performance using the timeit module (see Section 4.7 for an example of how to do this).

18. ® Compare the performance of the top-down, bottom-up, and left-corner parsers using the same grammar and three grammatical test sentences. Use timeit to log the amount of time each parser takes on the same sentence. Write a function that runs all three parsers on all three sentences, and prints a 3-by-3 grid of times, as well as row and column totals. Discuss your findings.

19. ® Read up on "garden path" sentences. How might the computational work of a parser relate to the difficulty humans have with processing these sentences? (See http://en.wikipedia.org/wiki/Garden_path_sentence.)

20. ® To compare multiple trees in a single window, we can use the draw_trees() method. Define some trees and try it out:

>>> from nltk.draw.tree import draw_trees >>> draw_trees(tree1, tree2, tree3)

21. ® Using tree positions, list the subjects of the first 100 sentences in the Penn tree-bank; to make the results easier to view, limit the extracted subjects to subtrees whose height is at most 2.

22. ® Inspect the PP Attachment Corpus and try to suggest some factors that influence PP attachment.

23. ® In Section 8.2, we claimed that there are linguistic regularities that cannot be described simply in terms of n-grams. Consider the following sentence, particularly the position of the phrase in his turn. Does this illustrate a problem for an approach based on n-grams?

What was more, the in his turn somewhat youngish Nikolay Parfenovich also turned out to be the only person in the entire world to acquire a sincere liking to our "dis-criminated-against" public procurator. (Dostoevsky: The Brothers Karamazov)

24. ® Write a recursive function that produces a nested bracketing for a tree, leaving out the leaf nodes and displaying the non-terminal labels after their subtrees. So the example in Section 8.6 about Pierre Vinken would produce: [[[NNP NNP]NP , [ADJP [CD NNS]NP JJ]ADJP ,]NP-SBJ MD [VB [DT NN]NP [IN [DT JJ NN]NP]PP-CLR [NNP CD]NP-TMP]VP .]S. Consecutive categories should be separated by space.

25. ® Download several electronic books from Project Gutenberg. Write a program to scan these texts for any extremely long sentences. What is the longest sentence you can find? What syntactic construction(s) are responsible for such long sentences?

26. ® Modify the functions init_wfst() and complete_wfst() so that the contents of each cell in the WFST is a set of non-terminal symbols rather than a single nonterminal.

27. ® Consider the algorithm in Example 8-3. Can you explain why parsing context-free grammar is proportional to n3, where n is the length of the input sentence?

28. ® Process each tree of the Penn Treebank Corpus sample nltk.corpus.treebank and extract the productions with the help of Tree.productions(). Discard the productions that occur only once. Productions with the same lefthand side and similar righthand sides can be collapsed, resulting in an equivalent but more compact set of rules. Write code to output a compact grammar.

29. • One common way of defining the subject of a sentence S in English is as the noun phrase that is the child of S and the sibling of VP. Write a function that takes the tree for a sentence and returns the subtree corresponding to the subject of the sentence. What should it do if the root node of the tree passed to this function is not S, or if it lacks a subject?

30. • Write a function that takes a grammar (such as the one defined in Example 8-1) and returns a random sentence generated by the grammar. (Use gram mar.start() to find the start symbol of the grammar; grammar.productions(lhs) to get the list of productions from the grammar that have the specified lefthand side; and production.rhs() to get the righthand side of a production.)

31. • Implement a version of the shift-reduce parser using backtracking, so that it finds all possible parses for a sentence, what might be called a "recursive ascent parser." Consult the Wikipedia entry for backtracking at http://en.wikipedia.org/wiki/Back tracking.

32. • As we saw in Chapter 7, it is possible to collapse chunks down to their chunk label. When we do this for sentences involving the word gave, we find patterns such as the following:

gave NP

gave up NP in NP gave NP up gave NP NP gave NP to NP

a. Use this method to study the complementation patterns of a verb of interest, and write suitable grammar productions. (This task is sometimes called lexical acquisition.)

b. Identify some English verbs that are near-synonyms, such as the dumped/filled/ loaded example from (64) in Chapter 9. Use the chunking method to study the complementation patterns of these verbs. Create a grammar to cover these cases. Can the verbs be freely substituted for each other, or are there constraints? Discuss your findings.

33. • Develop a left-corner parser based on the recursive descent parser, and inheriting from Parsel.

34. • Extend NLTK's shift-reduce parser to incorporate backtracking, so that it is guaranteed to find all parses that exist (i.e., it is complete).

35. • Modify the functions init_wfst() and complete_wfst() so that when a nonterminal symbol is added to a cell in the WFST, it includes a record of the cells from which it was derived. Implement a function that will convert a WFST in this form to a parse tree.

CHAPTER 9

Was this article helpful?

0 0

Post a comment