What is a text? At one level, it is a sequence of symbols on a page such as this one. At another level, it is a sequence of chapters, made up of a sequence of sections, where each section is a sequence of paragraphs, and so on. However, for our purposes, we will think of a text as nothing more than a sequence of words and punctuation. Here's how we represent text in Python, in this case the opening sentence of Moby Dick:
>>> sent1 = ['Call', 'me', 'Ishmael', '.'] >>>
After the prompt we've given a name we made up, sent1, followed by the equals sign, and then some quoted words, separated with commas, and surrounded with brackets. This bracketed material is known as a list in Python: it is how we store a text. We can inspect it by typing the name O. We can ask for its length ©. We can even apply our own lexical_diversity() function to it ©. >>> sent1 O
['Call', 'me', 'Ishmael', '.'] >>> len(sent1) © 4
Some more lists have been defined for you, one for the opening sentence of each of our texts, sent2 ... sent9. We inspect two of them here; you can see the rest for yourself using the Python interpreter (if you get an error saying that sent2 is not defined, you need to first type from nltk.book import *). >>> sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.'] >>> sent3
['In', 'the', 'beginning', 'God', 'created', 'the',
Your Turn: Make up a few sentences of your own, by typing a name, equals sign, and a list of words, like this: ex1 = ['Monty', 'Python', 'and', 'the', 'Holy', 'Grail']. Repeat some of the other Python operations we saw earlier in Section 1.1, e.g., sorted(exl), len(set(exl)), exl.count('the').
A pleasant surprise is that we can use Python's addition operator on lists. Adding two lists creates a new list with everything from the first list, followed by everything from the second list:
>>> ['Monty', 'Python'] + ['and', 'the', 'Holy', 'Grail'] O ['Monty', 'Python', 'and', 'the', 'Holy', 'Grail']
This special use of the addition operation is called concatenation; it combines the lists together into a single list. We can concatenate sentences to build up a text.
We don't have to literally type the lists either; we can use short names that refer to predefined lists.
['Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the',
'House', 'of', 'Representatives', ':', 'Call', 'me', 'Ishmael', '.'] >>>
What if we want to add a single item to a list? This is known as appending. When we append() to a list, the list itself is updated as a result of the operation.
>>> sent1.append("Some") >>> sent1
['Call', 'me', 'Ishmael', '.', 'Some'] >>>
Was this article helpful?