Records with Multiple Fields

Here is some United States housing data for 1983 and 1984, also taken from [Hynnd]. The first column is the monthly housing starts (thousands of units), the second is the total construction contracts (millions of dollars), and the third is the average interest rate for a new home mortgage (percent):

Download

fileproc/housing.dat

91.3

11,

.358

13

96.3

11,

.355

12

62

134.6

16,

.100

12

97

135.8

16,

.315

12

02

174.9

19,

.205

12

21

173.2

20,

263

11

9

161.6

16,

885

12

02

176.

8

19.

441

12.

01

154.

9

17.

379

12.

08

159.

3

16.

028

11.

8

136

15.

401

11.

82

108.

3

13.

518

11.

94

109.

1

14.

.023

11.

8

130

14.

442

11.

78

137.

5

17.

916

11.

56

172.

7

17.

655

11.

55

180.

7

21.

990

11.

68

184

20.

036

11.

61

162.

1

19.

224

11.

91

147.

4

19.

367

11.

89

148.

5

16.

.923

12.

03

152.

3

18.

413

12.

27

126.

2

16.

616

12.

27

98.9

14.

220

12.

05

This data differs from the previous example, because although there are multiple values per line, in this case each value represents something different. We want to compare total housing starts and construction contracts from 1983 to 1984; for the moment, we don't care about interest rates.

What if we decide to ask more questions about this data in the future? Instead of rereading the data from the file, we can store the data for future use. But how will we store it? We can create twelve lists, one for each month, or two lists, one for housing starts and one for total construction contracts, and store the data by column. Another option is to create a list of lists to keep all the data together. Twelve variables feels like too many, so let's store the data by column using two lists. (A lot of program design is based on what "feels" right. There are no universal hard-and-fast rules for good design; there are only trade-offs and consequences.) Using the two lists to store the data, we compare housing starts and construction contracts from 1983 to 1984:

Download fileproc/housing.py

import sys def housing(r):

'''Return the difference between the housing starts and construction contracts in 1983 and in 1984 from reader r.'''

# The monthly housing starts, in thousands of units, starts = []

# The construction contracts, in millions of dollars, contracts = []

# Read the file, populating the lists, for line in r:

start, contract, rate = line.split()

starts.append(float(start))

contracts.append(float(contract))

return (sum(starts[12:23]) - sum(starts[0:11]), sum(contracts[12:23]) - sum(contracts[0:11]))

input_file = open(sys.argv[1], "r") print housing(input_file) input_file.close()

The result is the tuple (55.799999999999955, 16.875000000000028), showing that both housing starts and construction contracts rose from 1983 to 1984.

This program answered our question, but it could still be improved. its first shortcoming is that it throws away the interest rate data; although we don't need this right now, someone might in future, so we should create a third list and store it. The second improvement is to separate the parsing and processing of the data, that is, to have one function that reads the data and another that does calculations on it. That way, we can reuse the parsing code every time we have new questions.

Download fileproc/housing_2.py

import sys def read_housing_data(r):

'''Read housing data from reader r, returning lists of starts, contracts, and rates.'''

for line in r:

start, contract, rate = line.split() starts.append(float(start)) contracts.append(float(contract)) rates.append(rate)

return (starts, contracts, rates)

def process_housing_data(starts, contracts):

'''Return the difference between the housing starts and construction contracts in 1983 and in 1984.'''

Date Latitude Longitude

Figure 8.1: A fixed-width file format return (sum(starts[12:23]) - sum(starts[0:11]), sum(contracts[12:23]) - sum(contracts[0:11]))

starts, contracts, rates = read_housing_data(input_fi1e) print process_housing_data(starts, contracts) input_fi1e.c1ose()

Many programs go one step further and separate parsing, processing, and reporting (the printing of results). That way, both the input and output can be used in other programs without having to be rewritten, and programs can process data from other sources [Wil05].

Was this article helpful?

0 0

Post a comment