Dynamic Programming

Dynamic programming is a general technique for designing algorithms which is widely used in natural language processing. The term "programming" is used in a different sense to what you might expect, to mean planning or scheduling. Dynamic programming is used when a problem contains overlapping subproblems. Instead of computing solutions to these subproblems repeatedly, we simply store them in a lookup table. In the remainder of this section, we will introduce dynamic programming, but in a rather different context to syntactic parsing.

Pingala was an Indian author who lived around the 5th century B.C., and wrote a treatise on Sanskrit prosody called the Chandas Shastra. Virahanka extended this work around the 6th century A.D., studying the number of ways of combining short and long syllables to create a meter of length n. Short syllables, marked S, take up one unit of length, while long syllables, marked L, take two. Pingala found, for example, that there are five ways to construct a meter of length 4: V4 = {LL, SSL, SLS, LSS, SSSS}. Observe that we can split V4 into two subsets, those starting with L and those starting with S, as shown in (1).

LL, LSS

i.e. L prefixed to each item of /2 = {L, SS} SSL, SLS, SSSS

With this observation, we can write a little recursive function called virahanka1() to compute these meters, shown in Example 4-9. Notice that, in order to compute V4 we first compute V3 and V2. But to compute V3, we need to first compute V2 and V\. This call structure is depicted in (2).

Example 4-9. Four ways to compute Sanskrit meter: (i) iterative, (ii) bottom-up dynamic programming, (iii) top-down dynamic programming, and (iv) built-in memoization.

s = ["S" + prosody for prosody in virahankal(n-l)] l = ["L" + prosody for prosody in virahankal(n-2)] return s + l def virahanka2(n):

lookup = [[""], ["S"]] for i in range(n-l):

s = ["S" + prosody for prosody in lookup[i+l]] l = ["L" + prosody for prosody in lookup[i]] lookup.append(s + l) return lookup[n]

def virahanka3(n, lookup={0:[""], l:["S"]}): if n not in lookup:

s = ["S" + prosody for prosody in virahanka3(n-l)] l = ["L" + prosody for prosody in virahanka3(n-2)] lookup[n] = s + l return lookup[n]

from nltk import memoize @memoize def virahanka4(n): if n == 0:

s = ["S" + prosody for prosody in virahanka4(n-l)] l = ["L" + prosody for prosody in virahanka4(n-2)] return s + l

>>> virahankal(4) ['SSSS', 'SSL', 'SLS', >>> virahanka2(4) ['SSSS', 'SSL', 'SLS', >>> virahanka3(4) ['SSSS', 'SSL', 'SLS', >>> virahanka4(4) ['SSSS', 'SSL', 'SLS',

V3 V2

V2 VI

V1 VI vo

VI vo

As you can see, V2 is computed twice. This might not seem like a significant problem, but it turns out to be rather wasteful as n gets large: to compute V20 using this recursive technique, we would compute V2 4,181 times; and for V40 we would compute V2 63,245,986 times! A much better alternative is to store the value of V2 in a table and look it up whenever we need it. The same goes for other values, such as V3 and so on. Function virahanka2() implements a dynamic programming approach to the problem. It works by filling up a table (called lookup) with solutions to all smaller instances of the problem, stopping as soon as we reach the value we're interested in. At this point we read off the value and return it. Crucially, each subproblem is only ever solved once.

Notice that the approach taken in virahanka2() is to solve smaller problems on the way to solving larger problems. Accordingly, this is known as the bottom-up approach to dynamic programming. Unfortunately it turns out to be quite wasteful for some applications, since it may compute solutions to sub-problems that are never required for solving the main problem. This wasted computation can be avoided using the top-down approach to dynamic programming, which is illustrated in the function vira hanka3() in Example 4-9. Unlike the bottom-up approach, this approach is recursive. It avoids the huge wastage of virahanka1() by checking whether it has previously stored the result. If not, it computes the result recursively and stores it in the table. The last step is to return the stored result. The final method, in virahanka4(), is to use a Python "decorator" called memoize, which takes care of the housekeeping work done by virahanka3() without cluttering up the program. This "memoization" process stores the result of each previous call to the function along with the parameters that were used. If the function is subsequently called with the same parameters, it returns the stored result instead of recalculating it. (This aspect of Python syntax is beyond the scope of this book.)

This concludes our brief introduction to dynamic programming. We will encounter it again in Section 8.4.

Python has hundreds of third-party libraries, specialized software packages that extend the functionality of Python. NLTK is one such library. To realize the full power of Python programming, you should become familiar with several other libraries. Most of these will need to be manually installed on your computer.

0 0

Post a comment