Navigating the File System with the os Module

The os module and its submodule os.path are one of the most helpful things about using Python for a lot of day-to-day tasks that you have to perform on a lot of different systems. If you often need to write scripts and programs on either Windows or Unix that would still work on the other operating system, you know from Chapter 8 that Python takes care of much of the work of hiding the differences between how things work on Windows and Unix.

In this chapter, we're going to completely ignore a lot of what the os module can do (ranging from process control to getting system information) and just focus on some of the functions useful for working with files and directories. Some things you've been introduced to already, while others are new.

One of the difficult and annoying points about writing cross-platform scripts is the fact that directory names are separated by backslashes (\) under Windows, but forward slashes (/) under Unix. Even breaking a full path down into its components is irritatingly complicated if you want your code to work under both operating systems.

Furthermore, Python, like many other programming languages, makes special use of the backslash character to indicate special text, such as \n for a newline. This complicates your scripts that create file paths on Windows.

With Python's os.path module, however, you get some handy functions that will split and join path names for you automatically with the right characters, and they'll work correctly on any OS that Python is running on (including the Mac.) You can call a single function to iterate through the directory structure and call another function of your choosing on each file it finds in the hierarchy. You'll be seeing a lot of that function in the examples that follow, but first let's look at an overview of some of the useful functions in the os and os.path modules that you'll be using.

Function Name, as Called

Description

os.getcwd() Returns the current directory. You can think of this function as the basic coordinate of directory functions in whatever language.

os.listdir(directory) Returns a list of the names of files and subdirectories stored in the named directory. You can then run os.stat() on the individual files — for example, to determine which are files and which are subdirectories.

os.stat(path) Returns a tuple of numbers, which give you everything you could possibly need to know about a file (or directory). These numbers are taken from the structure returned by the ANSI C function of the same name, and they have the following meanings (some are dummy values under Windows, but they're in the same places!):

st_mode:

permissions on the file

st_ino:

inode number (Unix)

st_dev:

device number

st_nlink:

link number (Unix)

st_uid:

userid of owner

st_gid:

groupid of owner

st_size:

size of the file

st_atime:

time of last access

st_mtime:

time of last modification

st_ctime:

time of creation

os.path.split(path) Splits the path into its component names appropriately for the current operating system. Returns a tuple, not a list. This always surprises me.

os.path.join(components) Joins name components into a path appropriate to the current operating system

Table continued on following page

Function Name, as Called

Description

os.path.normcase(path)

Normalizes the case of a path. Under Unix, this has no effect because filenames are case-sensitive; but under Windows, where the OS will silently ignore case when comparing filenames, it's useful to run normcase on a path before comparing it to another path so that if one has capital letters, but the other doesn't, Python will be able to compare the two the same way that the operation system would — that is, they'd be the same regardless of capitalizations in the path names, as long as that's the only difference. Under Windows, the function returns a path in all lowercase and converts any forward slashes into backslashes.

os.path.walk(start, function, arg)

This is a brilliant function that iterates down through a directory tree starting at start. For each directory, it calls the function function like this: function(arg, dir, files), where the arg is any arbitrary argument (usually something that is modified, like a dictionary), dir is the name of the current directory, and files is a list containing the names of all the files and subdirectories in that directory. If you modify the files list in place by removing some subdirectories, you can prevent os.path.walk() from iterating into those subdirectories.

There are more functions where those came from, but these are the ones used in the example code that follows. You will likely use these functions far more than any others in these modules. Many other useful functions can be found in the Python module documentation for os and os.path.

Was this article helpful?

0 0

Post a comment