Simplified Partof Speech Tagset

Tagged corpora use many different conventions for tagging words. To help us get started, we will be looking at a simplified tagset (shown in Table 5-1).

Table 5-1. Simplified part-of-speech tagset

Tag

Meaning

Examples

AD]

adjective

new, good, high, special, big, local

ADV

adverb

really, already, still, early, now

CNJ

conjunction

and, or, but, if, while, although

DET

determiner

the, a, some, most, every, no

EX

existential

there, there's

FW

foreign word

dolce, ersatz, esprit, quo, maitre

MOD

modal verb

will, can, would, may, must, should

N

noun

year, home, costs, time, education

NP

proper noun

Alison, Africa, April, Washington

NUM

number

twenty-four, fourth, 1991,14:24

PRO

pronoun

he, their, her, its, my, I, us

P

preposition

on, of, at, with, by, into, under

TO

the word to

to

UH

interjection

ah, bang, ha, whee, hmpf, oops

V

verb

is, has, get, do, make, see, run

VD

past tense

said, took, told, made, asked

VG

present participle

making, going, playing, working

VN

past participle

given, taken, begun, sung

WH

wh determiner

who, which, when, what, where, how

Bonglo; „ rcst^ fs/ ' NN ' sr^ w ■ NN1 ^T« tit/ ' NNP' «W'K' 5 ras îiî/'NNP' 7/None

?/None as wteî/'NN1 l.iwW'JJ1 4v*WMT Wf/'NN' lil/'VM1 1SYW' Hindi: frfE^r^/'NNP1 *VPREP' T^V'JJ1 wmrWNN1 ähsrtr/VNNPC ' ^rr/'NNP' WPREP' WVFM' ïifrtt/'NN' VPREP' mm/'tW* VPREP' Î3=HTr/'PREP' w^çV'NNP' ?nr/'PREP' ÎTWNVB' ^V'VFM' Jrf/'VAUX' TT^FT/'NN' TV'PREP' ijraii/'NN' TTimTT/'NN' * VPREP' WtiVNtÎ' ^T/'PREP' snrr/'NN' V'PREP' ^TT^/'PREP'

Hfl^öthl: wro'm/'JJ' Srnfr^ij/'NN' srRiiirrssr/'NNPt' wmà/'NNP" frfwr/'PRP* ?/None îtotottV'NIT tjrV'NN' stra/*NN' ■ ?/Nöne W'MN' 3m,''VM' ,"5YM'

Figure 5-1. POS tagged data from four Indian languages: Bangla, Hindi, Marathi, and Telugu.

Let's see which of these tags are the most common in the news category of the Brown Corpus:

>>> from nltk.corpus import brown

>>> brown_news_tagged = brown.tagged_words(categories='news', simplify_tags=True) >>> tag_fd = nltk.FreqDist(tag for (word, tag) in brown_news_tagged) >>> tag_fd.keys()

['N', 'P', 'DET', 'NP', 'V', 'AD]', ',', '.', 'CNJ', 'PRO', 'ADV', 'VD', ...]

Your Turn: Plot the frequency distribution just shown using tag_fd.plot(cumulative=True). What percentage of words are tagged using the first five tags of the above list?

We can use these tags to do powerful searches using a graphical POS-concordance tool nltk.app.concordance(). Use it to search for any combination of words and POS tags, e.g., N N N N, hit/VD, hit/VN, or the ADJ man.

Was this article helpful?

0 0

Post a comment