## Using random to Solve Probability Questions

The following examples use the random module to solve probability-based questions numerically.

Return to zero: Consider the following: a hard disk head is normally resting at location 0, representing the start of the disk. Files (of size zero) are evenly distributed between location 0 and 1, where 1 represents the end of the disk. The head is required to access files randomly. After each read, the head returns to location zero. The question is, what is the average distance the head moves?

The answer is not hard: on average, the head moves a distance of 1.0 (don't forget it has to go back to location 0). You can easily verify this using a simple script:

>>> from random import random

>>> N = 1000 # number of files the head seeks

»> tot_dist/N 1.0253061728681021

The larger the value of N, (and assuming a good random() implementation), the more accurate the result.

Not returning to zero: Now consider the scenario where the head does not go back to location 0, but stays where it was before. Finding the average distance the head moves is a bit harder analytically, but numerically, with a simple script, the solution emerges quickly.

>>> from random import random

>>> N = 1000 # number of files the head seeks

»> tot_dist/N 0.33576448266202374

This number turns out to be 1/3.

Example: Friends Meeting

We turn to another example, one that makes use of a visual output as well.

Two friends decide to meet between 8 p.m. and 9 p.m. Once one of the friends arrives at the designated meeting spot, he waits for 10 minutes for his friend to show up. So if for example Friend 1 arrives at 8:40, he'll wait until 8:50 for Friend 2 to show up. Friend 1 doesn't know if Friend 2 already showed up earlier (the same is true for Friend 2, he doesn't know if Friend 1 showed up). But both friends are smart enough to know that if they arrive at 8:55, for example, they only need wait until 9:00 and not 9:05. The question: what's the probability that these two friends meet?

We again turn to the random module to help us solve this problem (see Listing 7-2). Only this time, we also visualize the result, hopefully gaining some insight as to how to solve the question analytically.

Listing 7-2. Friends Meeting from random import random from pylab import *

N = 40000 # number of events

# generate N events of friends times friendl, friend2 = [], []

friendl.append(random()) friend2.append(random())

# find all occurrences of friends meeting met = array([(x, y) for (x, y) in zip(friendl, friend2) \

if abs(y-x) < 1.0/6 ]) not_met = array([(x, y) for (x, y) in zip(friendl, friend2) \ if abs(y-x) >= 1.0/6 ])

# plot the result, this might shed some plot(met[:, 0], met[:, l],'+m') plot(not_met[:, 0], not_met[:, l],'og') title("Probability of meeting: %1.3f" % xlabel('Time of arrival of Friend 1') ylabel('Time of arrival of Friend 2') axis('scaled')

The first step is to generate a considerable number of events, in this case 40,000. An event is composed of two numbers: one associated with Friend 1's time of arrival and one associated with Friend 2's time of arrival. We store both their times in lists. The process of generating the events is performed in the first for loop. The function random() returns a value between 0 and 1, which maps out to the time of arrival: 0 is 8 p.m., 1 is 9 p.m.

Now that we have a considerable number of events, we ask at what times the friends meet. The friends meet if the difference between their times of arrival is less than 10 minutes, or 10 minutes / 60 minutes * 1.0 = 1/6 (1.0 is the range of random values). But it's also possible that Friend 1 arrives after Friend 2 and not the other way around. So we should be asking whether friendl-friend2 is less than 1/6, as well as whether friend2-friendl is less than 1/6. This can be elegantly coded as abs(friendl-friend2) < 1.0/6.

The actual implementation makes use of a list comprehension, returning a tuple of (x, y) values that match the condition abs(x-y) < 1.0/6., which means the friends have met. We then construct an array of these values (a NumPy array, more on this shortly) so we can easily access the x and y vectors, without any for loops. We also build a list of times the friends did not meet because we want to plot both, in different colors and markers.

Next we plot the results and calculate the probability of the friends meeting, numerically, as shown in Figure 7-3.

This visualization really helps. The corridor in the middle describes the events corresponding to the two friends meeting. The probability is the area of this corridor and can be calculated by the area of the entire square minus the area of the top-left triangle and the bottom-right triangle. Each triangle has an area of 0.5 x (5/6)2, and the total probability of meeting is 1 - (5/6)2 = 11/36 = 0.3055 . . . which is pretty close to the estimated numerical value (displayed in the figure title).

Probability of meeting: 0,307

Probability of meeting: 0,307

Time of arrival of Friend 1

Figure 7-3. Visualizing friends meeting

Time of arrival of Friend 1

Figure 7-3. Visualizing friends meeting