String Handling

The string module provides some useful constants such as string.ascii_let-ters and string.hexdigits. It also provides the string.Formatter class which we can subclass to provide custom string formatters.* The textwrap module can be used to wrap lines of text to a specified width, and to minimize indentation.

bytes type

The struct module

Python's most powerful string handling module is the re (regular expression) module. This is covered in Chapter 13.

The io.StringIO class can provide a string-like object that behaves like an in-memory text file. This can be convenient if we want to use the same code that writes to a file to write to a string.

Example: The io.StringIO Class |

Python provides two different ways of writing text to files. One way is to use a file object's write() method, and the other is to use the print() function with the file keyword argument set to a file object that is open for writing. For example:

print("An error message", file=sys.stdout) sys.stdout.write("Another error message\n")

The struct module provides functions for packing and unpacking numbers, Booleans, and strings to and from bytes objects using their binary representations. This can be useful when handling data to be sent to or received from low-level libraries written in C. The struct and textwrap modules are used by the convert-incidents.py program covered in Chapter 7.

The difflib module provides classes and methods for comparing sequences, such as strings, and is able to produce output both in standard "diff" formats and in HTML.

* The term subclassing (or specializing) is used for when we create a custom data type (a class) based on another class. Chapter 6 gives full coverage of this topic.

Both lines of text are printed to sys.stdout, a file object that represents the "standard output stream"—this is normally the console and differs from sys.stderr, the "error output stream" only in that the latter is unbuffered. (Python automatically creates and opens sys.stdin, sys.stdout, and sys.stderr at program start-up.) The print() function adds a newline by default, although we can stop this by giving the end keyword argument set to an empty string.

In some situations it is useful to be able to capture into a string the output that is intended to go to a file. This can be achieved using the io.StringIO class which provides an object that can be used just like a file object, but which holds any data written to it in a string. If the io.StringIO object is given an initial string, it can also be read as though it were a file.

We can access io.StringIO if we do import io, and we can use it to capture output destined for a file object such as sys.stdout:

sys.stdout = io.StringIO()

If this line is put at the beginning of a program, after the imports but before any use is made of sys.stdout, any text that is sent to sys.stdout will actually be sent to the io.StringIO file-like object which this line has created and which has replaced the standard sys.stdout file object. Now, when the print() and sys.stdout.write() lines shown earlier are executed, their output will go to the io.StringIO object instead of the console. (At any time we can restore the original sys.stdout with the statement sys.stdout = sys._stdout_.)

We can obtain all the strings that have been written to the io.StringIO object by calling the io.StringIO.getvalue() function, in this case by calling sys.stdout.getvalue()—the return value is a string containing all the lines that have been written. This string could be printed, or saved to a log or sent over a network connection like any other string. We will see another example of io.StringIO use a bit further on (V 227).

Was this article helpful?

0 0

Post a comment