Let's add one final complication. Suppose that molecules didn't have END markers but instead just a COMPND line followed by one or more ATOM lines. How would we read multiple molecules from a single file in that case?
At first glance, it doesn't seem much different from the problem we just solved: read_molecule could extract the molecule's name from the COMPND line and then read ATOM lines until it got either an empty string signaling the end of the file or another COMPND line signaling the start of the next molecule. But once it has read that COMPND line, the line isn't available for the next call to read_molecule, so how can we get the name of the second molecule (and all the ones following it)?
To solve this problem, our functions must always "look ahead" one line. Let's start with the function that reads multiple molecules:
'''Read zero or more molecules from reader r, returning a list of the molecules read.'''
molecule, line = read_molecule(r, line) result.append(molecule) return result
This function begins by reading the first line of the file. Provided that line is not the empty string (that is, the file being read is not empty),
0123 0123 0123 0123
0123 0123 0123 0123
Figure 8.2: A PDB file
Figure 8.2: A PDB file it passes both the stream to read from and the line into read_molecule, which is supposed to return two things: the next molecule in the file and the first line immediately after the end of that molecule (or an empty string if the end of file has been reached).
This simple description is enough to get us started writing the read_ molecule function. The first thing it has to do is check that line is actually the start of a molecule. It then reads lines from stream one at a time, looking for one of three situations:
• The end of file, which signals the end of both the current molecule and the file
• Another COMPND line, which signals the end of this molecule and the start of the next one
• An ATOM, which is to be added to the current molecule
The most important thing is that when this function returns, it returns both the molecule and the next line so that its caller can keep processing. The result is probably the most complicated function we have seen so far, but understanding the idea behind it will help you understand how it works.
Figure 8.3: Looking ahead
def read_molecule(r, line):
'''Read a molecule from reader r. The variable 'line' is the first line of the molecule to be read; the result is the molecule, and the first line after it (or the empty string if the end of file has been reached).'''
fields = line.splitO molecule = [fields]
line = r.readlineO
while line and not 1ine.startswith('COMPND'): fields = line.splitO key, num, type, x, y, z = fields mo1ecu1e.append((type, x, y, z)) line = r.readlineO
return molecule, line
Was this article helpful?