## Syntax

First-order logic keeps all the Boolean operators of propositional logic, but it adds some important new mechanisms. To start with, propositions are analyzed into predicates and arguments, which takes us a step closer to the structure of natural languages. The standard construction rules for first-order logic recognize terms such as individual variables and individual constants, and predicates that take differing numbers of arguments. For example, Angus walks might be formalized as walk(angus) and Angus sees Bertie as see(angus, bertie). We will call walk a unary predicate, and see a binary predicate. The symbols used as predicates do not have intrinsic meaning, although it is hard to remember this. Returning to one of our earlier examples, there is no logical difference between (13a) and (13b).

(13) a. love(margrietje, brunoke)

b. houden_van(margrietje, brunoke)

By itself, first-order logic has nothing substantive to say about lexical semantics—the meaning of individual words—although some theories of lexical semantics can be encoded in first-order logic. Whether an atomic predication like see(angus, bertie) is true or false in a situation is not a matter of logic, but depends on the particular valuation that we have chosen for the constants see, angus, and bertie. For this reason, such expressions are called non-logical constants. By contrast, logical constants (such as the Boolean operators) always receive the same interpretation in every model for first-order logic.

We should mention here that one binary predicate has special status, namely equality, as in formulas such as angus = aj. Equality is regarded as a logical constant, since for individual terms ti and t2, the formula ti = t2 is true if and only if ti and t2 refer to one and the same entity.

It is often helpful to inspect the syntactic structure of expressions of first-order logic, and the usual way of doing this is to assign types to expressions. Following the tradition of Montague grammar, we will use two basic types: e is the type of entities, while t is the type of formulas, i.e., expressions that have truth values. Given these two basic types, we can form complex types for function expressions. That is, given any types CT and T, (ct, t) is a complex type corresponding to functions from 'CT things' to 't things'. For example, (e, t) is the type of expressions from entities to truth values, namely unary predicates. The LogicParser can be invoked so that it carries out type checking.

>>> tlp = nltk.LogicParser(type_check=True) >>> parsed = tlp.parse('walk(angus)') >>> parsed.argument <ConstantExpression angus> >>> parsed.argument.type e

>>> parsed.function <ConstantExpression walk> >>> parsed.function.type <e,?>

Why do we see <e,?> at the end of this example? Although the type-checker will try to infer as many types as possible, in this case it has not managed to fully specify the type of walk, since its result type is unknown. Although we are intending walk to receive type <e, t>, as far as the type-checker knows, in this context it could be of some other type, such as <e, e> or <e, <e, t>. To help the type-checker, we need to specify a signature, implemented as a dictionary that explicitly associates types with non-logical constants:

>>> parsed = tlp.parse('walk(angus)', sig)

>>> parsed.function.type

A binary predicate has type (e, (e, t)). Although this is the type of something which combines first with an argument of type e to make a unary predicate, we represent binary predicates as combining directly with their two arguments. For example, the predicate see in the translation of Angus sees Cyril will combine with its arguments to give the result see(angus, cyril).

In first-order logic, arguments of predicates can also be individual variables such as x, y, and z. In NLTK, we adopt the convention that variables of type e are all lowercase. Individual variables are similar to personal pronouns like he, she, and it, in that we need to know about the context of use in order to figure out their denotation. One way of interpreting the pronoun in (14) is by pointing to a relevant individual in the local context.

(14) He disappeared.

Another way is to supply a textual antecedent for the pronoun he, for example, by uttering (15a) prior to (14). Here, we say that he is coreferential with the noun phrase Cyril. In such a context, (14) is semantically equivalent to (15b).

(15) a. Cyril is Angus's dog. b. Cyril disappeared.

Consider by contrast the occurrence of he in (16a). In this case, it is bound by the indefinite NP a dog, and this is a different relationship than coreference. If we replace the pronoun he by a dog, the result (16b) is not semantically equivalent to (16a).

(16) a. Angus had a dog but he disappeared.

b. Angus had a dog but a dog disappeared.

Corresponding to (17a), we can construct an open formula (17b) with two occurrences of the variable x. (We ignore tense to simplify exposition.)

(17) a. He is a dog and he disappeared. b. dog(x) & disappear(x)

By placing an existential quantifier 3x ("for some x") in front of (17b), we can bind these variables, as in (18a), which means (18b) or, more idiomatically, (18c).

b. At least one entity is a dog and disappeared.

c. A dog disappeared.

Here is the NLTK counterpart of (18a):

In addition to the existential quantifier, first-order logic offers us the universal quantifier Vx ("for all x"), illustrated in (20).

b. Everything has the property that if it is a dog, it disappears.

c. Every dog disappeared.

Here is the NLTK counterpart of (20a):

Although (20a) is the standard first-order logic translation of (20c), the truth conditions aren't necessarily what you expect. The formula says that if some x is a dog, then x disappears—but it doesn't say that there are any dogs. So in a situation where there are no dogs, (20a) will still come out true. (Remember that (P -> Q) is true when P is false.) Now you might argue that every dog disappeared does presuppose the existence of dogs, and that the logic formalization is simply wrong. But it is possible to find other examples that lack such a presupposition. For instance, we might explain that the value of the Python expression astring.replace('ate', '8') is the result of replacing every occurrence of 'ate' in astring by '8', even though there may in fact be no such occurrences (Table 3-2).

We have seen a number of examples where variables are bound by quantifiers. What happens in formulas such as the following? ((exists x. dog(x)) -> bark(x))

The scope of the exists x quantifier is dog(x), so the occurrence of x in bark(x) is unbound. Consequently it can become bound by some other quantifier, for example, all x in the next formula:

In general, an occurrence of a variable x in a formula 9 is free in 9 if that occurrence doesn't fall within the scope of all x or some x in 9. Conversely, if x is free in formula 9, then it is bound in all x.9 and exists x.9. If all variable occurrences in a formula are bound, the formula is said to be closed.

We mentioned before that the parse() method of NLTK's LogicParser returns objects of class Expression. Each instance expr of this class comes with a method free(), which returns the set of variables that are free in expr.

>>> lp.parse('dog(x)').free() set([Variable('x')])

>>> lp.parse('own(angus, cyril)').free() set([])

>>> lp.parse('exists x.dog(x)').free() set([])

>>> lp.parse('((some x. walk(x)) -> sing(x))').free() set([Variable('x')])

>>> lp.parse('exists x.own(y, x)').free() set([Variable('y')])

0 0