Data Container Responsibilities

The application's data container is responsible for holding all the data items, that is, the movie records, and for saving and loading them to and from disk. We saw in the preceding section when we looked at the MainWin-dow.updateTable() method how the container could be iterated over using a for loop to get all the movies so that they could be displayed in the application's QTableWidget. In this section, we will look at the functionality provided by the moviedata module, including the data structures used to hold the movie data, how we provide support for ordered iteration, and other aspects, but excluding the actual saving and loading code since that is covered in the sections that follow.

Why use a custom data container at all? After all, we could simply use one of Python's built-in data structures, such as a list or a dictionary. We prefer to take an approach where we wrap a standard data structure in a custom container class. This ensures that accesses to the data are controlled by our class, which helps to maintain data integrity. It also makes it easier to extend the container's functionality, and to replace the underlying data structure in the future, without affecting existing code. In other words, this is an object-oriented approach that avoids the disadvantages of simply using, say, a list, with some global functions.

We will begin with the moviedata module's imports and constants.

import bisect import codecs import copy_reg import cPickle import gzip from PyQt4.QtCore import * from PyQt4.QtXml import *

Ordered

Dict

The codecs module is necessary for reading and writing Python text files using a specific text codec. The copy_reg and cPickle modules are used for saving and loading Python "pickles"—these are files that contain arbitrary Python data structures. The gzip module is used to compress data; we will use it to compress and decompress our pickled data. The PyQt4.QtCore import is familiar, but we must also import the PyQt4.QtXml module to give us access to PyQt's SAX and DOM parsers. We will see all of these modules in use in the following sections. Note that we do not need the PyQt4.QtGui module, since the moviedata module is a pure data-handling module with no GUI functionality.

We store the movies in canonicalized title order, ignoring case, and ignoring - leading "A", "An", and "The" words. To minimize insertion and lookup times we maintain the order using the bisect module, using the same techniques we used for the OrderedDict we implemented in Chapter 3.

NEWPARA = unichr(0x2029) NEWLINE = unichr(0x2028)

We want to use the UTF-8 codec for text files. This is an 8-bit Unicode encoding that uses one byte for each ASCII character, and two or more bytes for any other character. It is probably the most widely used Unicode text encoding used in files. By using Unicode we can store text written in just about any human language in use today.

Although \n is a valid Unicode character, we will need to use the Unicode-specific paragraph break and line break characters when we use XML. This is because XML parsers do not normally distinguish between one ASCII whitespace character, such as newline, and another, such as space, which is not convenient if we want to preserve the user's line and paragraphs breaks.

class Movie(object): UNKNOWNYEAR = 1890 UNKNOWNMINUTES = 0

def _init_(self, title=None, year=UNKNOWNYEAR, minutes=UNKNOWNMINUTES, acquired=None, notes=None): self.title = title self.year = year self.minutes = minutes self.acquired = acquired \

if acquired is not None else QDate.currentDate() self.notes = notes

The Movie class is used to hold the data about one movie. We use instance variables directly rather than providing simple getters and setters. The title and notes are stored as QStrings, and the date acquired as a QDate. The year the movie was released and its duration in minutes are held as ints. We provide two static constants to indicate that we do not know when the movie was released or how long it is.

We are now ready to look at the movie container class. This class holds an ordered list of movies, and provides functionality for saving and loading (and exporting and importing) movies in a variety of formats.

class MovieContainer(object): MAGIC_NUMBER = 0x3051E FILE_VERSION = 100

function 13«

Object refer-

The MAGIC_NUMBER and FILE_VERSION are used for saving and loading files using PyQt's QDataStream class.

The filename is held as a QString. Each element of the_movies list is itself a two-element list, the first element being a sort key and the second a Movie. This is the class's main data structure, and it is used to hold the movies in order. The

_movieFromId dictionary's keys are the id()s of Movie objects, and the values are Movies. As we saw in Chapter 1, every Python object very conveniently has a unique ID, available by calling id() on it. This dictionary is used to provide fast movie lookup when a movie's ID is known. For example, the main window stores movie IDs as "user" data in its first column of QTableWidgetItems. There is no duplication of data, of course, since the two data structures really hold references to Movie objects rather than Movie objects themselves.

for pair in iter(self._movies):

yield pair[1]

When the MainWindow.updateTable() method iterated over the movie container using a for loop, Python used the container's_iter_() method. Here we can see that we iterate over the ordered list of [key, movie] lists, returning just the movie item each time.

return len(self.

movies)

This method allows us to use the len() function on movie containers.

In the following sections we will see the code for loading and saving the movies held in a movie container in various formats. But first we will look at how the container is cleared, and how movies are added, deleted, and updated, so that we can get a feel for how the container works, particularly regarding ordering.

def clear(self, clearFilename=True):

if clearFilename:

self._dirty = False

This method is used to clear all the data, possibly including the filename. It is called from MainWindow.fileNew(), which does clear the filename, and from the various save and load methods, which leave the filename untouched. The movie container maintains a dirty flag so that it always knows whether there are unsaved changes.

movieFromId:

ences return False key = self.key(movie.title, movie.year)

bisect.insort_left(self._movies, [key, movie])

self._movieFromId[id(movie)] = movie self._dirty = True return True

The first if statement ensures that we don't add the same movie twice. We use the key() method to generate a suitable order key, and then use the bisect module's insort_left() function to insert the two-element [key, movie] list into the_movies list. This is very fast because the bisect module uses the binary chop algorithm. We also make sure that the_movieFromId dictionary is up-to-date, and set the container to be dirty.

def key(self, title, year):

text = unicode(title).lower() if text.startswith("a "):

text = text[2:] elif text.startswith("an "):

text = text[3:] elif text.startswith("the "):

text = text[4:] parts = text.split(" ", 1) if parts[0].isdigit():

text = "%08d " % int(parts[0]) if len(parts) > 1: text += parts[1] return u"%s\t%d" % (text.replace(" ", ""), year)

This method generates a key string suitable for ordering our movie data. We do not guarantee key uniqueness (although it would not be difficult to do), because the ordered data structure is a list in which duplicate keys are not a problem. The code is English-specific, eliminating the definite and indefinite articles from movie titles. If the movie's title begins with a number, we pad the number with leading zeros so that, for example, "20" will come before "100". We do not need to pad the year, because years are always exactly four digits. All the other data is stored using PyQt data types, but we have chosen to use unicode for the key strings.

def delete(self, movie):

if id(movie) not in self._movieFromId:

return False key = self.key(movie.title, movie.year)

i = bisect.bisect_left(self._movies, [key, movie])

del self._movies[i]

del self._movieFromId[id(movie)]

self._dirty = True return True

To delete a movie we must remove it from both data structures, and in the case of the_movies list, we must first find the movie's index position.

def updateMovie(self, movie, title, year, minutes=None, notes=None): if minutes is not None:

movie.minutes = minutes if notes is not None:

movie.notes = notes if title != movie.title or year != movie.year: key = self.key(movie.title, movie.year)

i = bisect.bisect_left(self._movies, [key, movie])

movie.title = title movie.year = year self._movies.sort()

self._dirty = True

If the user edits a movie, the application always calls this method with the user's changes. If the minutes or notes are passed as None, we take that to mean that they have not been changed. If the movie's title or year has changed, the movie may now be in the wrong position in the_movies list. In these cases, we find the movie using its original title and year, set the new title and year, and then re-sort the list. This is not as expensive in practice as it may at first appear. The list will contain, at most, one incorrectly sorted item, and Python's sort algorithm is highly optimized for partially sorted data.

If we ever found that we had a performance problem here, we could always reimplement updateMovie() using delete() and add() instead.

@staticmethod def formats():

Normally, we would provide one, or at most two, custom data formats for an application, but for the purposes of illustration we provide three formats using four extensions. Extension .mqb is Qt binary format, and it uses the QDataStream class, and extension .mpb is Python pickle format (using gzip compression). Extension .mqt is Qt text format, and it uses the QTextStream class, and extension .mpt is Python text format. Both text formats are identical, but by using different extensions we can use different save and load code for the purposes of comparison.

def save(self, fname=QString()): if not fname.isEmpty():

self._fname = fname if self._fname.endsWith(".mqb"):

return self.saveQDataStream()

elif self._fname.endsWith(".mpb"):

return self.savePickle()

elif self._fname.endsWith(".mqt"):

return self.saveQTextStream()

elif self._fname.endsWith(".mpt"):

return self.saveText() return False, "Failed to save: invalid file extension"

When the user invokes the "file save" action we would expect the data container's save() method to be invoked. This is indeed what happens in My Movies and is the normal practice. However, here, instead of performing the save itself, the save() method hands the work to a method that is specific to the filename's extension. This is purely so that we can show how to save in the different formats; in a real application we would normally use only one format.

There is a corresponding load() method, that has the same logic as the save() method and passes its work to load methods that are extension-specific. All the load and save methods return a two-element tuple, the first element a Boolean success/failure flag and the second a message, either an error message or a report of what successfully occurred.

We have now seen the application's infrastructure for file handling, and the container's data structures that hold the data in memory. In the following sections, we will look at the code that performs the saving and loading of the container's data to and from disk.

Was this article helpful?

0 0
Tube Traffic Ninja

Tube Traffic Ninja

Discover How You Can Quickly And Easily Dominate Google and YouTube... With Simple Cash Generating Videos. Did you know that YouTube is the second largest search website on the entire Internet? YouTube gets more daily searches than Bing and Yahoo. In fact, there is only one search engine that gets more action.

Get My Free Ebook


Post a comment