## Records with Multiple Fields

Here is some United States housing data for 1983 and 1984, also taken from [Hynnd]. The first column is the monthly housing starts (thousands of units), the second is the total construction contracts (millions of dollars), and the third is the average interest rate for a new home mortgage (percent):

 Download fileproc/housing.dat 91.3 11, .358 13 96.3 11, .355 12 62 134.6 16, .100 12 97 135.8 16, .315 12 02 174.9 19, .205 12 21 173.2 20, 263 11 9 161.6 16, 885 12 02
 176 8 19 441 12 01 154 9 17 379 12 08 159 3 16 28 11 8 136 15 401 11 82 108 3 13 518 11 94 109 1 14 0.023 11 8 130 14 442 11 78 137 5 17 916 11 56 172 7 17 655 11 55 180 7 21 990 11 68 184 20 36 11 61 162 1 19 224 11 91 147 4 19 367 11 89 148 5 16 0.923 12 03 152 3 18 413 12 27 126 2 16 616 12 27 98.9 14 220 12 05

This data differs from the previous example, because although there are multiple values per line, in this case each value represents something different. We want to compare total housing starts and construction contracts from 1983 to 1984; for the moment, we don't care about interest rates.

What if we decide to ask more questions about this data in the future? Instead of rereading the data from the file, we can store the data for future use. But how will we store it? We can create twelve lists, one for each month, or two lists, one for housing starts and one for total construction contracts, and store the data by column. Another option is to create a list of lists to keep all the data together. Twelve variables feels like too many, so let's store the data by column using two lists. (A lot of program design is based on what "feels" right. There are no universal hard-and-fast rules for good design; there are only trade-offs and consequences.) Using the two lists to store the data, we compare housing starts and construction contracts from 1983 to 1984:

import sys def housing(r):

'''Return the difference between the housing starts and construction contracts in 1983 and in 1984 from reader r.'''

# The monthly housing starts, in thousands of units, starts = []

# The construction contracts, in millions of dollars, contracts = []

# Read the file, populating the lists, for line in r:

start, contract, rate = line.split()

starts.append(float(start))

contracts.append(float(contract))

return (sum(starts[12:23]) - sum(starts[0:11]), sum(contracts[12:23]) - sum(contracts[0:11]))

input_file = open(sys.argv[1], "r") print housing(input_file) input_file.close()

The result is the tuple (55.799999999999955, 16.875000000000028), showing that both housing starts and construction contracts rose from 1983 to 1984.

This program answered our question, but it could still be improved. its first shortcoming is that it throws away the interest rate data; although we don't need this right now, someone might in future, so we should create a third list and store it. The second improvement is to separate the parsing and processing of the data, that is, to have one function that reads the data and another that does calculations on it. That way, we can reuse the parsing code every time we have new questions.

'''Read housing data from reader r, returning lists of starts, contracts, and rates.'''

for line in r:

start, contract, rate = line.split() starts.append(float(start)) contracts.append(float(contract)) rates.append(rate)

return (starts, contracts, rates)

def process_housing_data(starts, contracts):

'''Return the difference between the housing starts and construction contracts in 1983 and in 1984.'''

Date Latitude Longitude

Figure 8.1: A fixed-width file format return (sum(starts[12:23]) - sum(starts[0:11]), sum(contracts[12:23]) - sum(contracts[0:11]))

starts, contracts, rates = read_housing_data(input_fi1e) print process_housing_data(starts, contracts) input_fi1e.c1ose()

Many programs go one step further and separate parsing, processing, and reporting (the printing of results). That way, both the input and output can be used in other programs without having to be rewritten, and programs can process data from other sources [Wil05].