Custom Modules

Since modules are just .py files they can be created without formality. In this section we will look at two custom modules. The first module, TextUtil (in file TextUtil.py), contains just three functions: is_balanced() which returns True if the string it is passed has balanced parentheses of various kinds, shorten() (shown earlier; 177 •<), and simplify(), a function that can strip spurious whitespace and other characters from a string. In the coverage of this module we will also see how to execute the code in docstrings as unit tests.

The second module, CharGrid (in file CharGrid.py), holds a grid of characters and allows us to "draw" lines, rectangles, and text onto the grid and to render the grid on the console. This module shows some techniques that we have not seen before and is more typical of larger, more complex modules.

The TextUtil Module |

The structure of this module (and most others) differs little from that of a program. The first line is the shebang line, and then we have some comments (typically the copyright and license information). Next it is common to have a triple quoted string that provides an overview of the module's contents, often including some usage examples—this is the module's docstring. Here is the start of the TextUtil.py file (but with the license comment lines omitted):

#!/usr/bin/env python3

# Copyright (c) 2008-9 Qtrac Ltd. All rights reserved.

This module provides a few string manipulation functions.

>>> is_balanced("(Python (is (not (lisp))))") True

>>> shorten("The Crossing", 10) 'The Cro...'

>>> simplify(" some text with spurious whitespace ") 'some text with spurious whitespace'

ii ii ii import string

This module's docstring is available to programs (or other modules) that import the module as TextUtil._doc_. After the module docstring come the imports, in this case just one, and then the rest of the module.

short- We have already seen the shorten() function reproduced in full, so we will not en() repeat it here. And since our focus is on modules rather than on functions, 177 < although we will show the simplify() function in full, including its docstring, we will show only the code for is_balanced().

This is the simplify() function, broken into two parts:

def simplify(text, whitespace=string.whitespace, delete=""):

r Returns the text with multiple spaces reduced to single spaces

The whitespace parameter is a string of characters, each of which is considered to be a space.

If delete is not empty it should be a string, in which case any characters in the delete string are excluded from the resultant string.

>>> simplify(" this and\n that\t too") 'this and that too'

>>> simplify(" Washington D.C.\n") 'Washington D.C.'

>>> simplify(" Washington D.C.\n", delete=",;:.") 'Washington DC'

>>> simplify(" disemvoweled ", delete="aeiou") 'dsmvwld'

ii ii ii

After the def line comes the function's docstring, laid out conventionally with Raw a single line description, a blank line, further description, and then some strings examples written as though they were typed in interactively. Because the 67 < quoted strings are inside a docstring we must either escape the backslashes inside them, or do what we have done here and use a raw triple quoted string.

for char in text:

if char in delete:

continue elif char in whitespace: if word:

result.append(word) word = ""

else:

result.append(word) return " ".join(result)

The result list is used to hold "words"—strings that have no whitespace or deleted characters. The given text is iterated over character by character, with deleted characters skipped. If a whitespace character is encountered and a word is in the making, the word is added to the result list and set to be an empty string; otherwise, the whitespace is skipped. Any other character is added to the word being built up. At the end a single string is returned consisting of all the words in the result list joined with a single space between each one.

The is_balanced() function follows the same pattern of having a def line, then a docstring with a single-line description, a blank line, further description, and some examples, and then the code itself. Here is the code without the docstring:

def is_balanced(text, brackets="()[]{}<>"): counts = {} left_for_right = {}

for left, right in zip(brackets[::2], brackets[1::2]):

assert left != right, "the bracket characters must differ" counts[left] = 0 left_for_right[right] = left for c in text:

if c in counts:

counts[c] += 1 elif c in left_for_right: left = left_for_right[c]

return False counts[left] -= 1 return not any(counts.values())

The function builds two dictionaries. The counts dictionary's keys are the opening characters ("(", "[", "{", and "<"), and its values are integers. The left_for_right dictionary's keys are the closing characters (")","]","}", and ">"), and its values are the corresponding opening characters. Once the dictionaries are set up the function iterates character by character over the text. Whenever an opening character is encountered, its corresponding count is incremented. Similarly, when a closing character is encountered, the function finds out what the corresponding opening character is. If the count for that character is 0 it means we have reached one closing character too many so can immediately return False; otherwise, the relevant count is decremented. At the end every count should be 0 if all the pairs are balanced, so if any one of them is not 0 the function returns False; otherwise, it returns True.

Up to this point everything has been much like any other .py file. If TextUtil.py was a program there would presumably be some more functions, and at the end we would have a single call to one of those functions to start off the processing. But since this is a module that is intended to be imported, defining functions is sufficient. And now, any program or module can import TextUtil and make use of it:

import TextUtil text = " a puzzling conundrum "

text = TextUtil.simplify(text) # text == 'a puzzling conundrum'

If we want the TextUtil module to be available to a particular program, we just need to put TextUtil.py in the same directory as the program. If we want TextUtil.py to be available to all our programs, there are a few approaches that can be taken. One approach is to put the module in the Python distribution's site-packages subdirectory—this is usually C:\Python31\Lib\site-packages on Windows, but it varies on Mac OS X and other Unixes. This directory is in the Python path, so any module that is here will always be found. A second approach is to create a directory specifically for the custom modules we want to use for all our programs, and to set the PYTHONPATH environment variable to this directory. A third approach is to put the module in the local site-packages subdirectory—this is %APPDATA%\Python\Python31\site-packages on Windows and ~/.local/lib/python3.1/site-packages on Unix (including Mac OS X) and is in the Python path. The second and third approaches have the advantage of keeping our own code separate from the official installation.

Having the TextUtil module is all very well, but if we end up with lots of programs using it we might want to be more confident that it works as advertised.

One really simple way to do this is to execute the examples in the docstrings and make sure that they produce the expected results. This can be done by adding just three lines at the end of the module's .py file:

import doctest doctest.testmod()

Whenever a module is imported Python creates a variable for the module called __name__ and stores the module's name in this variable. A module's name is simply the name of its .py file but without the extension. So in this example, when the module is imported_name_will have the value "TextUtil", and the if condition will not be met, so the last two lines will not be executed. This means that these last three lines have virtually no cost when the module is imported.

Whenever a .py file is run Python creates a variable for the program called

_name_and sets it to the string "_main_". So if we were to run TextUtil.py as though it were a program, Python will set_name_to "_main_" and the if condition will evaluate to True and the last two lines will be executed.

The doctest.testmod() function uses Python's introspection features to discover all the functions in the module and their docstrings, and attempts to execute all the docstring code snippets it finds. Running a module like this produces output only if there are errors. This can be disconcerting at first since it doesn't look like anything happened at all, but if we pass a command-line flag of -v, we will get output like this:

Trying:

is_balanced("(Python (is (not (lisp))))") Expecting: True o.k..

Trying:

simplify(" disemvoweled ", delete="aeiou") Expecting: 'dsmvwld'

4 items passed all tests: 3 tests in __main__ 5 tests in _main_.is_balanced

3 tests in __main__.shorten

4 tests in __main__.simplify 15 tests in 4 items.

15 passed and 0 failed. Test passed.

We have used an ellipsis to indicate a lot of lines that have been omitted. If there are functions (or classes or methods) that don't have tests, these are listed when the -v option is used. Notice that the doctest module found the tests in the module's docstring as well as those in the functions' docstrings.

Examples in docstrings that can be executed as tests are called doctests. Note that when we write doctests, we are able to call simplify() and the other functions unqualified (since the doctests occur inside the module itself). Outside the module, assuming we have done import TextUtil, we must use the qualified names, for example, TextUtil.is_balanced().

In the next subsection we will see how to do more thorough tests—in particular, testing cases where we expect failures, for example, invalid data causing exceptions. (Testing is covered more fully in Chapter 9.) We will also address some other issues that arise when creating modules, including module initialization, accounting for platform differences, and ensuring that if the from module import * syntax is used, only the objects we want to be made public are actually imported into the importing program or module.

The CharGrid Module |

The CharGrid module holds a grid of characters in memory. It provides functions for "drawing" lines, rectangles, and text on the grid, and for rendering the grid onto the console. Here are the module's docstring's doctests:

>>> add_vertical_line(2, 9, 12, "!")

>>> add_horizontal_line(3, 10, 20, "+")

>>> add_rectangle(0, 0, 5, 5, "%")

>>> add_rectangle(5, 7, 12, 40, "#", True)

>>> add_rectangle(7, 9, 10, 38, " ")

>>> add_text(8, 10, "This is the CharGrid module")

>>> add_text(1, 32, "Pleasantville", "@")

>>> add_rectangle(6, 42, 11, 46, fill=True)

The CharGrid.add_rectangle() function takes at least four arguments, the top-left corner's row and column and the bottom-right corner's row and column. The character used to draw the outline can be given as a fifth argument, and a Boolean indicating whether the rectangle should be filled (with the same character as the outline) as a sixth argument. The first time we call it we pass the third and fourth arguments by unpacking the 2-tuple (width, height), returned by the CharGrid.get_size() function.

By default, the CharGrid.render() function clears the screen before printing the grid, but this can be prevented by passing False as we have done here. Here is the grid that results from the preceding doctests:

@[email protected] *

################################# #################################

| ################################# | #################################

The module begins in the same way as the TextUtil module, with a shebang line, copyright and license comments, and a module docstring that describes the module and has the doctests quoted earlier. Then the code proper begins with two imports, one of the sys module and the other of the subprocess module. The subprocess module is covered more fully in Chapter 10.

The module has two error-handling policies in place. Several functions have a char parameter whose actual argument must always be a string containing exactly one character; a violation of this requirement is considered to be a fatal coding error, so assert statements are used to verify the length. But passing out-of-range row or column numbers is considered erroneous but normal, so custom exceptions are raised when this happens.

We will now review some illustrative and key parts of the module's code, beginning with the custom exceptions:

class RangeError(Exception): pass class RowRangeError(RangeError): pass class ColumnRangeError(RangeError): pass

None of the functions in the module that raise an exception ever raise a RangeError; they always raise the specific exception depending on whether an out-of-range row or column was given. But by using a hierarchy, we give users of the module the choice of catching the specific exception, or to catch either of them by catching their RangeError base class. Note also that inside docteststhe exception names are used as they appear here, but if the module is imported with import CharGrid, the exception names are, of course, CharGrid.RangeError, CharGrid.RowRangeError, and CharGrid.ColumnRangeError.

_CHAR_ASSERT_TEMPLATE = ("char must be a single character: '{0}' "

"is too long")

Here we define some private data for internal use by the module. We use leading underscores so that if the module is imported using from CharGrid import *, none of these variables will be imported. (An alternative approach would be to set an __all_ list.) The _CHAR_ASSERT_TEMPLATE is a string for use with the str.format() function; we will see it used to give an error message in assert statements. We will discuss the other variables as we encounter them.

if sys.platform.startswith("win"): def clear_screen():

subprocess.call(["cmd.exe", "/C", "cls"])

else:

def clear_screen():

subprocess.call(["clear"])

clear_screen._doc_= Clears the screen using the underlying \

window system's clear screen command"""

The means of clearing the console screen is platform-dependent. On Windows we must execute the cmd.exe program with appropriate arguments and on most Unix systems we execute the clear program. The subprocess module's subprocess.call() function lets us run an external program, so we can use it to clear the screen in the appropriate platform-specific way. The sys.platform string holds the name of the operating system the program is running on, for example, "win32" or "linux2". So one way of handling the platform differences would be to have a single clear_screen() function like this:

def clear_screen():

command = (["clear"] if not sys.platform.startswith("win") else

["cmd.exe", "/C", "cls"]) subprocess.call(command)

The disadvantage of this approach is that even though we know the platform cannot change while the program is running, we perform the check every time the function is called.

To avoid checking which platform the program is being run on every time the clear_screen() function is called, we have created a platform-specific clear_screen() function once when the module is imported, and from then on we always use it. This is possible because the def statement is a Python statement like any other; when the interpreter reaches the if it executes either the first or the second def statement, dynamically creating one or the other clear_screen() function. Since the function is not defined inside another function (or inside a class as we will see in the next chapter), it is still a global function, accessible like any other function in the module.

After creating the function we explicitly set its docstring; this avoids us having to write the same docstring in two places, and also illustrates that a docstring is simply one of the attributes of a function. Other attributes include the function's module and its name.

def resize(max_rows, max_columns, char=None):

Changes the size of the grid, wiping out the contents and changing the background if the background char is not None ii n n assert max_rows > 0 and max_columns > 0, "too small" global _grid, _max_rows, _max_columns, _background_char if char is not None:

assert len(char) == 1, _CHAR_ASSERT_TEMPLATE.format(char) _background_char = char _max_rows = max_rows _max_columns = max_columns

_grid = [[_background_char for column in range(_max_columns)] for row in range(_max_rows)]

This function uses an assert statement to enforce the policy that it is a coding error to attempt to resize the grid smaller than 1 x 1. If a background character is specified an assert is used to guarantee that it is a string of exactly one character; if it is not, the assertion error message is the _CHAR_ASSERT_TEMPLATE's text with the {0} replaced with the given char string.

Unfortunately, we must use the global statement because we need to update a number of global variables inside this function. This is something that using an object-oriented approach can help us to avoid, as we will see in Chapter 6.

List compre hen-sions

for column in range(_max_columns): _grid[-1].append(_background_char)

This code is arguably trickier to understand than the list comprehension, and is much longer.

The _grid is created using a list comprehension inside a list comprehension. Using list replication such as [[char] * columns] * rows will not work because the inner list will be shared (shallow-copied). We could have used nested for ... in loops instead:

We will review just one of the drawing functions to give a flavor of how the drawing is done, since our primary concern is with the implementation of the module. Here is the add_horizontal_line() function, split into two parts:

def add_horizontal_line(row, column0, column1, char="-"):

"""Adds a horizontal line to the grid using the given char

>>> add_horizontal_line(8, 20, 25, "=") >>> char_at(8, 20) == char_at(8, 24) == "=" True

>>> add_horizontal_line(31, 11, 12) Traceback (most recent call last):

RowRangeError ii II II

The docstring has two tests, one that is expected to work and another that is expected to raise an exception. When dealing with exceptions in doctests the pattern is to specify the "Traceback" line, since that is always the same and tells the doctest module an exception is expected, then to use an ellipsis to stand for the intervening lines (which vary), and ending with the exception line we expect to get. The char_at() function is one of those provided by the module; it returns the character at the given row and column position in the grid.

assert len(char) == 1, _CHAR_ASSERT_TEMPLATE.format(char) try:

for column in range(column0, column1): _grid[row][column] = char except IndexError:

raise RowRangeError() raise ColumnRangeError()

The code begins with the same character length check that is used in the re-size() function. Rather than explicitly checking the row and column arguments, the function works by assuming that the arguments are valid. If an IndexError exception occurs because a nonexistent row or column is accessed, we catch the exception and raise the appropriate module-specific exception in its place. This style of programming is known colloquially as "it's easier to ask forgiveness than permission", and is generally considered more Pythonic (good Python programming style) than "look before you leap", where checks are made in advance. Relying on exceptions to be raised rather than checking in advance is more efficient when exceptions are rare. (Assertions don't count as "look before you leap" because they should never occur—and are often commented out—in deployed code.)

Almost at the end of the module, after all the functions have been defined, there is a single call to resize():

resize(_max_rows, _max_columns)

This call initializes the grid to the default size (25 x 80) and ensures that code that imports the module can safely make use of it immediately. Without this call, every time the module was imported, the importing program or module would have to call resize() to initialize the grid, forcing programmers to remember that fact and also leading to multiple initializations.

import doctest doctest.testmod()

The last three lines of the module are the standard ones for modules that use the doctest module to check their doctests. (Testing is covered more fully in Chapter 9.)

The CharGrid module has an important failing: It supports only a single character grid. One solution to this would be to hold a collection of grids in the module, but that would mean that users of the module would have to provide a key or index with every function call to identify which grid they were referring to. In cases where multiple instances of an object are required, a better solution is to create a module that defines a class (a custom data type), since we can create as many class instances (objects of the data type) as we like. An additional benefit of creating a class is that we should be able to avoid using the global statement by storing class (static) data. We will see how to create classes in the next chapter.

0 0

Post a comment