Python supports a wide range of operators, such as < and >=, for testing the relationship between values. The full set of these relational operators are shown in Table 1-3.

Table 1-3. Numerical comparison operators Operator Relationship < Less than

<= Less than or equal to

== Equal to (note this is two "="signs, not one)

Operator Relationship

> Greater than

>= Greater than or equal to

We can use these to select different words from a sentence of news text. Here are some examplesâ€”notice only the operator is changed from one line to the next. They all use sent7, the first sentence from text7 (Wall Street Journal). As before, if you get an error saying that sent7 is undefined, you need to first type: from nltk.book import *. >>> sent7

['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the',

'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.']

[',', '61', 'old', ',', 'the', 'as', 'a', '29', '.']

[',', '61', 'old', ',', 'will', 'join', 'the', 'as', 'a', 'Nov.', '29', '.']

['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'the', 'board',

'as', 'a', 'nonexecutive', 'director', '29', '.'] >>>

There is a common pattern to all of these examples: [w for w in text if condition], where condition is a Python "test" that yields either true or false. In the cases shown in the previous code example, the condition is always a numerical comparison. However, we can also test various properties of words, using the functions listed in Table 1-4.

Function |
Meaning | ||

s |
startswith(t) |
Test |
f s starts with t |

s |
endswith(t) |
Test |
f s ends with t |

t |
in s |
Test |
f t is contained inside s |

s |
islower() |
Test |
f all cased characters in s are lowercase |

s |
isupper() |
Test |
f all cased characters in s are uppercase |

s |
isalpha() |
Test |
f all characters in s are alphabetic |

s |
isalnum() |
Test |
f all characters in s are alphanumeric |

s |
isdigit() |
Test |
f all characters in s are digits |

s |
istitle() |
Test |
f s is titlecased (all words in s have initial capitals) |

Here are some examples of these operators being used to select words from our texts: words ending with -ableness; words containing gnt; words having an initial capital; and words consisting entirely of digits.

>>> sorted([w for w in set(textl) if w.endswith('ableness')])

['comfortableness', 'honourableness', 'immutableness', 'indispensableness', ...]

>>> sorted([term for term in set(text4) if 'gnt' in term])

['Sovereignty', 'sovereignties', 'sovereignty']

>>> sorted([item for item in set(text6) if item.istitle()])

['A', 'Aaaaaaaaah', 'Aaaaaaaah', 'Aaaaaah', 'Aaaah', 'Aaaaugh', 'Aaagh', ...]

>>> sorted([item for item in set(sent7) if item.isdigit()])

We can also create more complex conditions. If c is a condition, then not c is also a condition. If we have two conditions cj and cj, then we can combine them to form a new condition using conjunction and disjunction: c^ and cj, c^ or cj.

* > Your Turn: Run the following examples and try to explain what is going

M.*' 4 0n in each one. Next, try to make up some conditions of your own.

--â€”W?' >>> sorted([w for w in set(text7) if '-' in w and 'index' in w])

>>> sorted([wd for wd in set(text3) if wd.istitle() and len(wd) > 10])

>>> sorted([w for w in set(sent7) if not w.islower()])

>>> sorted([t for t in set(text2) if 'cie' in t or 'cei' in t])

Was this article helpful?

## Post a comment