A lexicon, or lexical resource, is a collection of words and/or phrases along with associated information, such as part-of-speech and sense definitions. Lexical resources are secondary to texts, and are usually created and enriched with the help of texts. For example, if we have defined a text my_text, then vocab = sorted(set(my_text)) builds the vocabulary of my_text, whereas word_freq = FreqDist(my_text) counts the frequency of each word in the text. Both vocab and word_freq are simple lexical resources. Similarly, a concordance like the one we saw in Section 1.1 gives us information about word usage that might help in the preparation of a dictionary. Standard terminology for lexicons is illustrated in Figure 2-5. A lexical entry consists of a headword (also known as a lemma) along with additional information, such as the part-of-speech and crtTmo Sense de^tion.
jl ¡t or gloss saw, [verb], past tense of see. saw, [noun], cutting instrument.
\ Part of speech, or lexical category
Figure 2-5. Lexicon terminology: Lexical entries for two lemmas having the same spelling (homonyms), providing part-of-speech and gloss information.
the sense definition. Two distinct words having the same spelling are called homonyms.
The simplest kind of lexicon is nothing more than a sorted list of words. Sophisticated lexicons include complex structure within and across the individual entries. In this section, we'll look at some lexical resources included with NLTK.
Was this article helpful?