Looking Ahead

Let's add one final complication. Suppose that molecules didn't have END markers but instead just a COMPND line followed by one or more ATOM lines. How would we read multiple molecules from a single file in that case?

At first glance, it doesn't seem much different from the problem we just solved: read_molecule could extract the molecule's name from the COMPND line and then read ATOM lines until it got either an empty string signaling the end of the file or another COMPND line signaling the start of the next molecule. But once it has read that COMPND line, the line isn't available for the next call to read_molecule, so how can we get the name of the second molecule (and all the ones following it)?

To solve this problem, our functions must always "look ahead" one line. Let's start with the function that reads multiple molecules:

Download fileproc/lookahead.py

def read_all_molecules(r):

'''Read zero or more molecules from reader r, returning a list of the molecules read.'''

while line:

molecule, line = read_molecule(r, line) result.append(molecule) return result

This function begins by reading the first line of the file. Provided that line is not the empty string (that is, the file being read is not empty),

molecule

molecule

'AMMONIA'

'AMMONIA'

0123 0123 0123 0123

0123 0123 0123 0123

0.257

0.000

0.000

0.890

Figure 8.2: A PDB file

0.257

0.000

0.000

0.890

Figure 8.2: A PDB file it passes both the stream to read from and the line into read_molecule, which is supposed to return two things: the next molecule in the file and the first line immediately after the end of that molecule (or an empty string if the end of file has been reached).

This simple description is enough to get us started writing the read_ molecule function. The first thing it has to do is check that line is actually the start of a molecule. It then reads lines from stream one at a time, looking for one of three situations:

• The end of file, which signals the end of both the current molecule and the file

• Another COMPND line, which signals the end of this molecule and the start of the next one

• An ATOM, which is to be added to the current molecule

The most important thing is that when this function returns, it returns both the molecule and the next line so that its caller can keep processing. The result is probably the most complicated function we have seen so far, but understanding the idea behind it will help you understand how it works.

molecule,-

nextr

'AMMONIA'

'COMPND

METHANOL\n'

molecule., next.,

'METHANOL'

Figure 8.3: Looking ahead

Download fileproc/lookahead_2.py

def read_molecule(r, line):

'''Read a molecule from reader r. The variable 'line' is the first line of the molecule to be read; the result is the molecule, and the first line after it (or the empty string if the end of file has been reached).'''

fields = line.splitO molecule = [fields[1]]

line = r.readlineO

while line and not 1ine.startswith('COMPND'): fields = line.splitO key, num, type, x, y, z = fields mo1ecu1e.append((type, x, y, z)) line = r.readlineO

return molecule, line

Was this article helpful?

0 0

Post a comment