## Discourse Representation Theory

The standard approach to quantification in first-order logic is limited to single sentences. Yet there seem to be examples where the scope of a quantifier can extend over two or more sentences. We saw one earlier, and here's a second example, together with a translation.

b. 3x.(dog(x) & own(Angus, x) & bite(x, Irene))

That is, the NP a dog acts like a quantifier which binds the it in the second sentence. Discourse Representation Theory (DRT) was developed with the specific goal of providing a means for handling this and other semantic phenomena which seem to be characteristic of discourse. A discourse representation structure (DRS) presents the meaning of discourse in terms of a list of discourse referents and a list of conditions. The discourse referents are the things under discussion in the discourse, and they correspond to the individual variables of first-order logic. The DRS conditions apply to those discourse referents, and correspond to atomic open formulas of first-order logic. Figure 10-4 illustrates how a DRS for the first sentence in (54a) is augmented to become a DRS for both sentences.

When the second sentence of (54a) is processed, it is interpreted in the context of what is already present in the lefthand side of Figure 10-4. The pronoun it triggers the addition of a new discourse referent, say, u, and we need to find an anaphoric antecedent for it—that is, we want to work out what it refers to. In DRT, the task of finding the antecedent for an anaphoric pronoun involves linking it to a discourse referent already within the current DRS, and y is the obvious choice. (We will say more about anaphora resolution shortly.) This processing step gives rise to a new condition u = y. The remaining content contributed by the second sentence is also merged with the content of the first, and this is shown on the righthand side of Figure 10-4.

Figure 10-4 illustrates how a DRS can represent more than just a single sentence. In this case, it is a two-sentence discourse, but in principle a single DRS could correspond to the interpretation of a whole text. We can inquire into the truth conditions of the righthand DRS in Figure 10-4. Informally, it is true in some situation s if there are entities a, c, and i in s corresponding to the discourse referents in the DRS such that Angus owns a dog.

Angus owns a dog. It bit Irene.

Figure 10-4. Building a DRS: The DRS on the lefthand side represents the result of processing the first sentence in the discourse, while the DRS on the righthand side shows the effect of processing the second sentence and integrating its content.

all the conditions are true in s; that is, a is named Angus, c is a dog, a owns c, i is named Irene, and c bit i.

In order to process DRSs computationally, we need to convert them into a linear format. Here's an example, where the DRS is a pair consisting of a list of discourse referents and a list of DRS conditions:

The easiest way to build a DRS object in NLTK is by parsing a string representation O. >>> dp = nltk.DrtParser()

>>> drs1 = dp.parse('([x, y], [angus(x), dog(y), own(x, y)])') O >>> print drs1

([x,y],[angus(x), dog(y), own(x,y)]) We can use the draw() method O to visualize the result, as shown in Figure 10-5. >>> drs1.draw() O

angus(Kj dog[V)

OYVii(xry)

### Figure 10-5. DRS screenshot.

When we discussed the truth conditions of the DRSs in Figure 10-4, we assumed that the topmost discourse referents were interpreted as existential quantifiers, while the conditions were interpreted as though they are conjoined. In fact, every DRS can be translated into a formula of first-order logic, and the fol() method implements this translation.

exists x y.((angus(x) & dog(y)) & own(x,y))

In addition to the functionality available for first-order logic expressions, DRT Expressions have a DRS-concatenation operator, represented as the + symbol. The concatenation of two DRSs is a single DRS containing the merged discourse referents and the conditions from both arguments. DRS-concatenation automatically a-converts bound variables to avoid name-clashes.

>>> drs2 = dp.parse('([x], [walk(x)]) + ([y], [run(y)])') >>> print drs2

(([x],[walk(x)]) + ([y],[run(y)])) >>> print drs2.simplify() ([x,y],[walk(x), run(y)])

While all the conditions seen so far have been atomic, it is possible to embed one DRS within another, and this is how universal quantification is handled. In drs3, there are no top-level discourse referents, and the sole condition is made up of two sub-DRSs, connected by an implication. Again, we can use fol() to get a handle on the truth conditions.

>>> drs3 = dp.parse('([], [(([x], [dog(x)]) -> ([y],[ankle(y), bite(x, y)]))])') >>> print drs3.fol()

all x.(dog(x) -> exists y.(ankle(y) & bite(x,y)))

We pointed out earlier that DRT is designed to allow anaphoric pronouns to be interpreted by linking to existing discourse referents. DRT sets constraints on which discourse referents are "accessible" as possible antecedents, but is not intended to explain how a particular antecedent is chosen from the set of candidates. The module nltk.sem.drt_resolve_anaphora adopts a similarly conservative strategy: if the DRS contains a condition of the form PRO(x), the method resolve_anaphora() replaces this with a condition of the form x = [...], where [...] is a list of possible antecedents.

>>> drs4 = dp.parse('([x, y], [angus(x), dog(y), own(x, y)])') >>> drs5 = dp.parse('([u, z], [PRO(u), irene(z), bite(u, z)])') >>> drs6 = drs4 + drs5 >>> print drs6.simplify()

([x,y,u,z],[angus(x), dog(y), own(x,y), PRO(u), irene(z), bite(u,z)]) >>> print drs6.simplify().resolve_anaphora()

([x,y,u,z],[angus(x), dog(y), own(x,y), (u = [x,y,z]), irene(z), bite(u,z)])

Since the algorithm for anaphora resolution has been separated into its own module, this facilitates swapping in alternative procedures that try to make more intelligent guesses about the correct antecedent.

Our treatment of DRSs is fully compatible with the existing machinery for handling A-abstraction, and consequently it is straightforward to build compositional semantic representations that are based on DRT rather than first-order logic. This technique is illustrated in the following rule for indefinites (which is part of the grammar drt.fcfg). For ease of comparison, we have added the parallel rule for indefinites from simple-sem.fcfg.

Det[NUM=sg,SEM=<\P Q.([x],[]) + P(x) + Q(x)>] -> 'a' Det[NUM=sg,SEM=<\P Q. exists x.(p(x) & q(x))>] -> 'a'

To get a better idea of how the DRT rule works, look at this subtree for the NP a dog:

(NP[NUM='sg', SEM=<\Q.(([x],[dog(x)]) + Q(x))>] (Det[NUM'sg', SEM=<\P Q.((([x],[]) + P(x)) + Q(x))>] a) (Nom[NUM='sg', SEM=<\x.([],[dog(x)])>] (N[NUM='sg', SEM=<\x.([],[dog(x)])>] dog)))))

The A-abstract for the indefinite is applied as a function expression to \x.([], [dog(x)]) which leads to \Q.(([x],[]) + ([],[dog(x)]) + Q(x)); after simplification, we get \Q.(([x],[dog(x)]) + Q(x)) as the representation for the NP as a whole.

In order to parse with grammar drt.fcfg, we specify in the call to load_earley() that SEM values in feature structures are to be parsed using DrtParser in place of the default LogicParser.

>>> parser = load_parser('grammars/book_grammars/drt.fcfg', logic_parser=nltk.DrtParser()) >>> trees = parser.nbest_parse('Angus owns a dog'.split()) >>> print trees.node['sem'].simplify() ([x,z2],[Angus(x), dog(z2), own(x,z2)])