Assertions and Flags

One problem that affects many of the regexes we have looked at so far is that they can match more or different text than we intended. For example, the regex aircraft airplane jet will match waterjet and jetski as well as jet. This kind of problem can be solved by using assertions. An assertion does not match any text, but instead says something about the text at the point where the assertion occurs. One assertion is b (word boundary), which asserts that the character that precedes it must be a...

Creating Classes That Aggregate Collections

A simple way of representing a 2D color image is as a two-dimensional array with each array element being a color. So to represent a 100 x 100 image we must store 10000 colors. For the Image class (in file Image.py), we will take a potentially more efficient approach. An Image stores a single background color, plus the colors of those points in the image that differ from the background color. This is done by using a dictionary as a kind of sparse array, with each key being an (x, y) coordinate...

Creating Collection Classes Using Inheritance

D SortedDict.SortedDict(dict(s 1, A 2, y 6), str.lower) str(d) returns 'A' 17, 'n' 3, 's' 1, 'T' 5, 'z' 4 The SortedDict implementation uses both aggregation and inheritance. The sorted list of keys is aggregated as an instance variable, whereas the SortedDict class itself inherits the dict class. We will start our code review by looking at the class line and the initializer, and then we will look at all of the other methods in turn. def_init_(self, dictionary None, key None, **kwargs)...

Identifiers and Keywords

When we create a data item we can either assign it to a variable, or insert it Object into a collection. (As we noted in the preceding chapter, when we assign in refer- Python, what really happens is that we bind an object reference to refer to ences the object in memory that holds the data.) The names we give to our object 16 references are called identifiers or just plain names. A valid Python identifier is a nonempty sequence of characters of any length that consists of a start character and...

Using the Threading Module

Setting up two or more separate threads of execution in Python is quite straightforward. The complexity arises when we want separate threads to share data. Imagine that we have two threads sharing a list. One thread might start iterating over the list using for x in L and then somewhere in the middle another thread might delete some items in the list. At best this will lead to obscure crashes, at worst to incorrect results. One common solution is to use some kind of locking mechanism. For...

Catching and Raising Exceptions

Exceptions are caught using try except blocks, whose general syntax is try except exception_group1 as variable1 except_suite1 except exceptiongroupN as variableN There must be at least one except block, but both the else and the finally blocks are optional. The else block's suite is executed when the try block's suite has finished normally but it is not executed if an exception occurs. If there is a finally block, it is always executed at the end. Each except clause's exception group can be a...

Main WindowStyle Programs

Although dialog-style programs are often sufficient for simple tasks, as the range of functionality a program offers grows it often makes sense to create a complete main-window-style application with menus and toolbars. Such applications are usually easier to extend than dialog-style programs since we can add extra menus or menu options and toolbar buttons without affecting the main window's layout. In this section we will review the bookmarks-tk.pyw program shown in Figure 15.4. The program...

Parsing the Blocks Domain Specific Language

The blocks.py program is provided as one of the book's examples. It reads one Py-or more .blk files that use a custom text format blocks format, a made-up Parsing language that are specified on the command line, and for each one creates blocks an SVG Scalable Vector Graphics file with the same name, but with its suffix changed to .svg. While the rendered SVG files could not be accused of being pretty, they provide a good visual representation that makes it easy to see PLY mistakes in the .blk...

Command Line Programming

If we need a program to be able to process text that may have been redirected in the console or that may be in files listed on the command line, we can use the fileinput module's fileinput.input function. This function iterates over all the lines redirected from the console if any and over all the lines in the files listed on the command line, as one continuous sequence of lines. The module can report the current filename and line number at any time using fileinput.filename and fileinput.lineno...

BNF Syntax and Parsing Terminology

Parsing is a means of transforming data that is in some structured format whether the data represents actual data, or statements in a programming language, or some mixture of both into a representation that reflects the data's structure and that can be used to infer the meaning that the data represents. The parsing process is most often done in two phases lexing also called lexical analysis, tokenizing, or scanning , and parsing proper also called syntactic analysis . For example, given a...

Using the Multiprocessing Module

In some situations we already have programs that have the functionality we need but we want to automate their use. We can do this by using Python's sub-process module which provides facilities for running other programs, passing any command-line options we want, and if desired, communicating with them using pipes. We saw one very simple example of this in Chapter 5 when we used the subprocess.call function to clear the console in a platform-specific way. But we can also use these facilities to...

Piece Collection Data Types

It is often convenient to hold entire collections of data items. Python provides several collection data types that can hold items, including associative arrays and sets. But here we will introduce just two tuple and list. Python tuples and lists can be used to hold any number of data items of any data types. Tuples are immutable, so once they are created we cannot change them. Lists are mutable, so we can easily insert items and remove items whenever we want. Tuples are created using commas ,...

Raw Binary Data with Optional Compression

Writing our own code to handle raw binary data gives us complete control over our file format. It should also be safer than using pickles, since maliciously invalid data will be handled by our code rather than executed by the interpreter. When creating custom binary file formats it is wise to create a magic number to identify your file type, and a version number to identify the version of the file format in use. Here are the definitions used in the convert-incidents.py program MAGIC bAIB x00...

Lex YaccStyle Parsing with PLY

PLY Python Lex Yacc is a pure Python implementation of the classic Unix tools, lex and yacc. Lex is a tool that creates lexers, and yacc is a tool that creates parsers often using a lexer created by lex. PLY is described by its author, David Beazley, as reasonably efficient and well suited for larger grammars. It provides most of the standard lex yacc features including support for empty productions, precedence rules, error recovery, and support for ambiguous grammars. PLY is straightforward to...

Playlist Data Parsing

Hand- In the previous section's second subsection we created a handcrafted regex- PLY crafted based parser for .m3u files. In this subsection we will create a parser to do the m3u m u same thing, but this time using the PyParsing module. An extract from a .m3u parser file is shown in Figure 14.6 523 lt , and the BNF is shown in Figure 14.7 557 As we did when reviewing the previous subsection's .pls parser, we will review the .m3u parser in three parts first the creation of the parser, then the...

Thedel Special Method

Values Used Class Hierarchy

The_del_ self special method is called when an object is destroyed at least in theory. In practice,_del_ may never be called, even at program termination. Furthermore, when we write del x, all that happens is that the object reference x is deleted and the count of how many object references refer to the object that was referred to by x is decreased by 1. Only when this count reaches 0 is_del_ likely to be called, but Python offers no guarantee that it will ever be called. In view of this,_del_...

Generategridpy

One frequently occurring need is the generation of test data. There is no single generic program for doing this, since test data varies enormously. Python is often used to produce test data because it is so easy to write and modify Python programs. In this subsection we will create a program that generates a grid of random integers the user can specify how many rows and columns they want and over what range the integers should span. We'll start by looking at a sample run invalid literal for int...