File Processing

The exact details of file-processing differ substantially among programming languages, but virtually all languages share certain underlying file manipulation concepts. First, we need some way to associate a file on disk with a variable in a program. This process is called opening a file. Once a file has been opened, it is manipulated through the variable we assign to it.

Second, we need a set of operations that can manipulate the file variable. At the very least, this includes operations that allow us to read the information from a file and write new information to a file. Typically, the reading and writing operations for text files are similar to the operations for text-based, interactive input and output.

Finally, when we are finished with a file, it is closed. Closing a file makes sure that any bookkeeping that was necessary to maintain the correspondence between the file on disk and the file variable is finished up. For example, if you write information to a file variable, the changes might not show up on the disk version until the file has been closed.

This idea of opening and closing files is closely related to how you might work with files in an application program like a word processor. However, the concepts are not exactly the same. When you open a file in a program like Microsoft Word, the file is actually read from the disk and stored into RAM. In programming terminology, the file is opened for reading and the the contents of the file are then read into memory via file reading operations. At this point, the file is closed (again in the programming sense). As you "edit the file," you are really making changes to data in memory, not the file itself. The changes will not show up in the file on the disk until you tell the application to "save" it.

Saving a file also involves a multi-step process. First, the original file on the disk is reopened, this time in a mode that allows it to store information—the file on disk is opened for writing. Doing so actually erases the old contents of the file. File writing operations are then used to copy the current contents of the in-memory version into the new file on the disk. From your perspective, it appears that you have edited an existing file. From the program's perspective, you have actually opened a file, read its contents into memory, closed the file, created a new file (having the same name), written the (modified) contents of memory into the new file, and closed the new file.

Working with text files is easy in Python. The first step is to associate a file with a variable using the open function.

Here name is a string that provides the name of the file on the disk. The mode parameter is either the string "r" or "w" depending on whether we intend to read from the file or write to the file.

For example, to open a file on our disk called "numbers.dat" for reading, we could use a statement like the following.

infile = open("numbers.dat", "r")

Now we can use the variable infile to read the contents of numbers.dat from the disk. Python provides three related operations for reading information from a file:

<filevar>.readline()

<filevar>.readlines()

The read operation returns the entire contents of the file as a single string. If the file contains more than one line of text, the resulting string has embedded newline characters between the lines. Here's an example program that prints the contents of a file to the screen.

# printfile.py

# Prints a file to the screen.

fname = raw_input("Enter filename: ") infile = open(fname,'r') data = infile.read() print data main()

The program first prompts the user for a filename and then opens the file for reading through the variable infile. You could use any name for the variable, I used infile to emphasize that the file was being used for input. The entire contents of the file is then read as one large string and stored in the variable data. Printing data causes the contents to be displayed.

The readline operation can be used to read one line from a file; that is, it reads all the characters up through the next newline character. Each time it is called, readline returns the next line from the file. This is analogous to raw_input which reads characters interactively until the user hits the <Enter> key; each call to raw_input get another line from the user. One thing to keep in mind, however, is that the string returned by readl ine will always end with a newline character, whereas raw_input discards the newline character.

As a quick example, this fragment of code prints out the first five lines of a file.

line = infile.readline() print line[:-1]

Notice the use of slicing to strip off the newline character at the end of the line. Since print automatically jumps to the next line (i.e., it outputs a newline), printing with the explicit newline at the end would put an extra blank line of output between the lines of the file.

As an alternative to readline, when you want to loop through all the (remaining) lines of a file, you can use readlines. This operation returns a sequence of strings representing the lines of the file. Used with a for loop, it is a particularly handy way to process each line of a file.

infile = open(someFile, 'r') for line in infile.readlines():

# process the line here infile.close()

Opening a file for writing prepares that file to receive data. If no file with the given name exists, a new file will be created. A word of warning: if a file with the given name does exist, Python will delete it and create a new, empty file. When writing to a file, make sure you do not clobber any files you will need later! Here is an example of opening a file for output.

outfile = open("mydata.out", "w")

We can put data into a file, using the write operation.

This is similar to print, except that write is a little less flexible. The write operation takes a single parameter, which must be a string, and writes that string to the file. If you want to start a new line in the file, you must explicitly provide the newline character.

Here's a silly example that writes two lines to a file.

outfile = open("example.out", 'w') count = 1

outfile.write("This is the first line\n") count = count + 1

outfile.write("This is line number %d" % (count)) outfile.close()

Notice the use of string formatting to write out the value of the variable count. If you want to output something that is not a string, you must first convert it; the string formatting operator is often a handy way to do this. This code will produce a file on disk called "example.out" containing the following two lines:

This is the first line This is line number 2

If "example.out" existed before executing this fragment, it's old contents were destroyed.

Was this article helpful?

0 0

Post a comment