The filename pattern is stored in the global variable OPTIONS.file_pattern. By default this is set to an empty string and so it will match all file names. This variable is controlled by the command-line parsing class, which I'm going to talk about later in the chapter. For the time being, just note that it can be set to any value by using the -p or --pattern option.
I need to create a list of directories and all subdirectories recursively so that I can search for the log files in them. Users are going to supply me with a list of top-level directories, which I need to explode into a full tree of all sub- and sub-sub directories.
The list of arguments is going to be stored in the ARGS variable by the OptionParser class. There is a really handy function in Python's os library called walk. It recursively builds a list of files in each directory and all subdirectories.
Let's set up a simple directory structure and see how the os.walk function works:
This will produce a three-level directory structure:
Now we can use os.walk to generate the same output], as shown in Listing 7-5.
Listing 7-5. Recursively retrieving a list of directories with os. walk $ python
Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information. >>> import os
('.', ['top_dir_1', 'top_dir_2'], ) ('./top_dir_1', ['sub_dir_1', 'sub_dir_2'] ('./top_dir_1/sub_dir_1', ['sub_sub_dir'], ('./top_dir_1/sub_dir_1/sub_sub_dir', , ('./top_dir_1/sub_dir_2', ['sub_sub_dir'], ('./top_dir_1/sub_dir_2/sub_sub_dir', , ('./top_dir_2', ['sub_dir_1', 'sub_dir_2'] ('./top_dir_2/sub_dir_1', ['sub_sub_dir'], ('./top_dir_2/sub_dir_1/sub_sub_dir', , ('./top_dir_2/sub_dir_2', ['sub_sub_dir'], ('./top_dir_2/sub_dir_2/sub_sub_dir', , >>> os.walk('.')
<generator object walk at 0x1004920a0> >>>
As you can see, a call to os.walk returns a generator object. I will talk about generators in more detail later in this chapter, but for now, note that they are objects that you can iterate through just like any normal Python list or tuple object.
The return result is a three-tuple with the following entries:
The directory path: The current directory whose contents are exposed in the next two variables.
Directory names: A list of directory names in the directory path. This list excludes '.' And '..' directories.
File names: A list of the file names in the directory path.
By default os.walk will not follow symbolic links that point to directories. To follow symbolic links, you can set the followlinks parameter to True, which will instruct os.walk to follow all symbolic links that it comes across while scanning the directory tree.
I'm only interested in the directory listing, as I'm going to use a different function to filter out the files that will be processed and analyzed. Collecting only the first element of the three-tuple result, I can build the list of directories. So to build a recursive list of all directories from the list of top-level directories that are supplied as an argument list, I would write the following:
for dir in ARGS:
for root, dirs, files in os.walk(dir): DIRS.append(root)
Now the DIRS list contains all directories that I will need to search for log files. I need to go through this list and search for all files that have a name satisfying two search patterns: either LOG_PATTERN or BZLOG_PATTERN and OPTIONS.file_pattern.
I'm going to use one of the simplest ways of obtaining the list, which is to traverse through the list of directories, create a simple listing of contents, and then match the result against search patterns and use only files that satisfy both. The following code does just that and opens matched files for reading:
for DIR in DIRS:
for file in (DIR + "/" + f for f in os. listdir(DIR) if f.find(LOG_PATTERN) != -1 and f.find(OPTIONS.file_pattern)
Take a closer look at the list construct, which is called list comprehension. This is a very powerful mechanism for creating lists of objects that you want to iterate through. With list comprehension you can quickly and elegantly apply some validation or transformation to an existing list and get the new list immediately.
For example, here's what you'd do to quickly generate a list of all even numbers squared from 1 to 10:
>>> [x**2 for x in range(10) if x % 2 == 0] [0, 4, 16, 36, 64]
The basic structure for list comprehension is
[ <operand> /operation/ for <operand> in <list> /if <check condition>/ ]
where <operand> is a variable used to generate a list, /operation / is an optional operation that you might need to perform on each element of the resulting list, <list> is the list of items you're iterating through, and /<check condition> / is the validation filter that filters out unwanted elements from the resulting list.
With this in mind, if I dissect my file list construct, here's what I have:
• Each element of the resulting array will be constructed as DIR + "/" + f, where DIR is the directory name and f is gathered from the os. listdir().
• The variable f is assigned in sequence to all elements of a list returned by calling os.listdir().
• Only those values are accepted that satisfy the condition (f.find(LOG_PATTERN) != -1 and f.find(OPTIONS.file_pattern) != -1), which requires them to match both LOG_PATTERN and OPTIONS.file_pattern.
Also note that you can use list comprehension to generate either a list object or a generator. If you create a generator, the next element value will be derived only when requested, for example in a for loop. Depending on the use, this may be much quicker and more memory-efficient than generating and holding the whole list in memory.
Was this article helpful?