Representing Tabular Data in Trees

Suppose we want the user to pick out a data item, but that the item they pick depends on some previous item they picked, which in turn depends on some previous item again. In concrete terms, imagine that we want the user to choose a particular airport—first they must choose the country, then the city, and then the airport. This could be done by providing three comboboxes, populating the first with country names, and populating the second with the cities in the current country, and the third with the airports in the current city. This is not difficult to program, but the user must use three separate widgets to specify their choice, and can't easily see what range of choices are available to them.

One solution to choosing dependent data items is to use a tree view. To continue the example, the roots of the tree would be the countries, and each country would have city branches, and each city branch would have airport leaves. This makes it much easier for the user to follow a path (and they can follow only valid paths), and easier for us to retrieve their complete country/city/airport choice.

Another benefit of using a tree view, compared, for example, with using a table view, is that it is more compact and easier to navigate. For example, if we had 100 countries, with an average of 4 cities each and an average of 2 airports per city, a table would require 100 x 4 x 2 = 800 rows—but a tree would need only 100 rows (one per country) with each row capable of being expanded to show its cities and airports.

In this section, we will show how to represent a table of data in a tree, and how to extract the complete "path" that the user has chosen. The example application we will use is called Server Info. It reads a dataset that has six columns—country, state (meaningful only in the United States), city, provider, server, and IP address—and allows users to specify one particular 6-tuple. The sample dataset has 163 rows, but refers to only 33 unique countries, so the user

Country/State (US)/City/Provlder

Server

Apache/1.3.4 193.140.83.20

I Turkey (No State) Ankara

-■■■ National Academic Backbone UlakNet Ankara S-m Ukraine 3 -X. United Kingdom USA u Arizona

Courtesy of Lotus Productions Apache/1.3.3 216.25.63.17 Lyceum Support Tools NCSA/1.4.2 205.142.28.7

" Apache/1.3.1 130.207.244.51

Southern CrossRoads

U S A * Georgl a* Atlanta*South ern C rossRoad s* Apache/1.3.1*130.207.244.51

Figure 16.3 Tabular data rendered as a tree need only navigate 33 top-level items rather than scrolling through almost five times that number of rows.

The heart of the application is provided by the TreeOfTableModel class, a QAbstractItemModel subclass that can represent arbitrary tabular data in a tree. We use a custom subclass of this model, along with a QTreeView subclass to present the data. The application itself can create the tree using different levels of nesting by running it from the console with a command-line argument of 1,2,3, or 4. Figure 16.3 shows the tree using the default nesting level of 3. (The nesting level does not include the leaf at the end of a series of branches.)

We will begin by reviewing the main form since it is very short. Then we will look at the table view subclass and the TreeOfTableModel subclass. Next, we will review the treeoftable module, including the BranchNode and LeafNode classes, and finally, the TreeOfTableModel class itself.

class MainForm(QMainWindow):

def_init_(self, filename, nesting, separator, parent=None):

super(MainForm, self)._init_(parent)

headers = ["Country/State (US)/City/Provider", "Server", "IP"] self.treeWidget = TreeOfTableWidget(filename, nesting, separator)

self.treeWidget.model().headers = headers self.setCentralWidget(self.treeWidget) self.connect(self.treeWidget, SIGNAL("activated"), self.activated)

self.setWindowTitle("Server Info")

The TreeOfTableWidget is similar to a convenience view, since it incorporates a model inside it. The model is a ServerModel, a small TreeOfTableModel subclass that adds the ability to show flag icons.

The filename is the name of a file that has data suitable for a TreeOfTableModel. In particular, it must have one record per line, and each column (field) must be separated by the specified separator.

The nesting value is the maximum number of branches that can spur off from a root, and does not count the leaves at the end. In this case, the nesting value passed in through the nesting parameter is 3 (unless it's changed on the command line), which means that we will have 3 levels of branches (country, state, city) and 1 level of leaves (provider). Since we have 6 fields, this means that the first 4 fields will be shown in the tree part of the tree widget, with the remaining 2 fields shown as separate columns in the rows that have leaves. The resultant tree view will have 3 columns, one containing the tree, and 2 more showing the extra fields. We set the model's headers by directly accessing the model inside the custom TreeOfTableWidget.

The activated() method is called when the user double-clicks or presses Enter on a row in the tree widget.

def activated(self, fields):

self.statusBar().showMessage("*".join(fields), 60000)

The "path", that is, the (country, city, state, provider, server, IP address) 6-tuple that the user has chosen, is shown "*"-separated in the status bar for a minute (60000 milliseconds), whenever a suitable row is activated. In this context, a suitable row is one containing a leaf, since these are the only ones that have all six fields.

The TreeOfTableWidget is a QTreeView subclass that contains the model it displays. It also provides a few simple convenience methods and creates some useful signal-slot connections.

class TreeOfTableWidget(QTreeView):

def_init_(self, filename, nesting, separator, parent=None):

super(TreeOfTableWidget, self)._init_(parent)

self.setSelectionBehavior(QTreeView.SelectItems)

self.setUniformRowHeights(True)

model = ServerModel(self)

self.setModel(model)

try:

model.load(filename, nesting, separator) except IOError, e:

QMessageBox.warning(self, "Server Info - Error", unicode(e))

self.connect(self, SIGNAL("activated(QModelIndex)"), self.activated)

self.connect(self, SIGNAL("expanded(QModelIndex)"), self.expanded) self.expanded()

The ServerModel is a TreeOfTableModel subclass. Its only purpose is to override the data() method so that it can provide suitable icons (country and state flags); we will review it shortly. After loading the model's data from the file and making the signal-slot connections, we call the expanded() method to give the columns suitable widths.

def expanded(self):

for column in range(self.model().columnCount(QModelIndex())): self.resizeColumnToContents(column)

Whenever the user expands a branch—for example, by clicking one of the tree's ffl symbols or by navigating with the arrow keys and pressing Right Arrow—this method is called. It ensures that the columns showing the expanded item's texts are wide enough for the text to be readable. In tree models, every item is either the child of another item (and therefore has a parent) or a top-level (root) item, in which case it has no parent, which is signified by an invalid model index. Therefore, when we call columnCount() with a QModelIndex() (i.e., with an invalid model index), we get the column count of top-level items.

def activated(self, index):

self.emit(SIGNAL("activated"), self.model().asRecord(index))

If the user activates an item by double-clicking it or by pressing Enter on it, this method is called, and in turn it emits its own activated() signal. Its parameter is the full path (record), as a list of field values, for the current model index.

def currentFields(self):

return self.model().asRecord(self.currentIndex())

This method provides the same information as the activated() signal, but it can be called at any time to get the current record; again, as a list of field values.

The ServerModel is a TreeOfTableModel subclass that reimplements one method, data(). It does so to show flags next to the names of countries and U.S. states.

class ServerModel(treeoftable.TreeOfTableModel):

def_init_(self, parent=None):

super(ServerModel, self)._init_(parent)

def data(self, index, role):

if role == Qt.DecorationRole:

node = self.nodeFromIndex(index) if node is None:

return QVariant() if isinstance(node, treeoftable.BranchNode):

return QVariant() filename = node.toString().replace(" ", "_") parent = node.parent.toString() if parent and parent != "USA":

return QVariant() if parent == "USA":

filename = "USA_" + filename filename = os.path.join(os.path.dirname(_file_),

"flags", filename + ".png") pixmap = QPixmap(filename) if pixmap.isNull():

return QVariant() return QVariant(pixmap) return treeoftable.TreeOfTableModel.data(self, index, role)

This data() reimplementation only handles data() requests where the role is Qt.DecorationRole, passing on any other request to the TreeOfTableModel base class. In list, table, and tree views, the decoration role is used to set or retrieve icons for data items.

Tree models work in terms of parents and children. In the TreeOfTableModel base class we have provided a method, nodeFromIndex(), that returns the node (item) corresponding to a particular model index. We have two kinds of nodes, branch nodes and leaf nodes. Each node can have any number of columns, although in this case the branch nodes have only one column and leaf nodes have at least one column. We provide icons for only the first (and only) column of branch nodes, and then only for the branches for countries and U.S. states.

The flag icons are stored in the flags subdirectory, with country flag names having underscores instead of spaces, and U.S. state names beginning with "USA_". All the flag icons are .png images. Instead of using a .qrc resource file, we retrieve the images directly from the filesystem. The os.path.dirname() function returns the path part of a full filename, and the os.path.join() function joins two or more strings to form a single path string with the appropriate path separators. If the required image does not exist or is unreadable, QPixmap.isNull() will return True; in this case, we return an invalid QVariant to signify that no icon is available. Otherwise, we return the pixmap wrapped in a QVariant.

The classes we have seen so far have been quite straightforward. This is because the real work of providing the tree model is done by the TreeOfTableModel. This model reads in a tabular dataset and converts the row/column data into a tree. The tree has a single branch node as its root, and then any number of branch nodes hanging off the root, with each branch able to have its own branches. At the end of each branch are one or more leaf nodes.

The nodes hanging off a branch are the branch's children. The children can be branches or leaves, and they are held in a list. Each child's position in its parent node's list of children is its row number. Column numbers refer to the items (fields) within a child (branch or leaf). A complete record (or "path") is the concatenation of all the fields in the root branch, all the intermediate branches, and the leaf at the end. The relationship between branches and leaves is shown schematically in Figure 16.4.

Connecting Widgets Pyqt
Figure 16.4 Schematic of a tree model's branches and leaves

In the tree of table model we have chosen to keep each branch's children in alphabetical order. To make this as fast and easy as possible, each branch's children list is actually a list of two-item lists, with the first item being the order key and the second item being the child node. We access the items in these two-item lists using the constants KEY and NODE rather than the literals 0 and 1.

We will now look at the branch node and leaf node implementations, and then at the tree of table model itself.

The branch and leaf nodes have many methods in common because in some contexts, they can be used interchangeably (thanks to duck typing).

class BranchNode(object):

def_init_(self, name, parent=None):

super(BranchNode, self)._init_(parent)

self.name = name self.parent = parent self.children = []

A branch node's name is the text shown in its first (and only) column. In the Server Info example, this would be the name of a country, state, or city, depending on where the branch is in the tree's hierarchy.

def orderKey(self):

return self.name.lower()

def toString(self): return self.name def _len_(self):

return len(self.children)

The order key is a string that is used by the node's parent to position this branch correctly in the node's parent's list of children. The toString() method returns the branch's one field as a string. These methods are provided for compatibility with leaf nodes to make it easier to use either kind of node based on duck typing. The_len_() method returns how many children the branch has.

def childAtRow(self, row):

assert 0 <= row < len(self.children) return self.children[row][NODE]

This method returns the node for the given row. We have used an assert assert statement here, and in many other places in the tree of table model's code. The code can be tricky to get right, but by using assertions we can at least be clear about what we expect to be true at particular points in the code.

statement

def rowOfChild(self, child):

for i, item in enumerate(self.children): if item[NODE] == child: return i return -1

Here we return the row index of a particular child node, or -1 if the child is not one of this node's children.

def childWithKey(self, key): if not self.children: return None i = bisect.bisect_left(self.children, (key, None)) if i < 0 or i >= len(self.children):

return None if self.children[i][KEY] == key: return self.children[i][NODE] return None

We sometimes want to find the first child that has a given order key. One approach would be to do what we did in the rowOfChild() method, iterating through the list of children to find the right one. Here we have taken a more efficient approach. We find the position that a node with the given key ought to occupy, and if this is in range and has the right key, we return the child.

def insertChild(self, child): child.parent = self bisect.insort(self.children, (child.orderKey(), child))

This method inserts a new child node into a branch's list of children, and makes this branch the child's parent. By using bisect.insort() in conjunction with the child's order key, we ensure that the child is put in the correct position as quickly and efficiently as possible. The insort() function is identical to insort_right().

def hasLeaves(self):

if not self.children:

return False return isinstance(self.children[0], LeafNode)

In the tree of table model, a branch that has children has either branches or leaves, but not a mixture of both. For this reason, if a branch has no children at all, clearly it has no leaves; and similarly, if it does have children and the first one is a leaf, all of them are leaves.

We have now seen the entire branch node class. Next, we will look at the much shorter leaf node class.

class LeafNode(object):

def_init_(self, fields, parent=None):

super(LeafNode, self)._init_(parent)

self.parent = parent self.fields = fields

The fields in a leaf node are the node's columns.

def orderKey(self):

return u"\t".join(self.fields).lower()

def toString(self, separator="\t"): return separator.join(self.fields)

return len(self.fields)

A leaf node's order key is the tab-separated concatenation of its fields. Similarly, its toString() method returns a concatenation of its fields. The_len_()

method returns the number of fields; for branches it returns the number of children.

def field(self, column):

assert 0 <= column <= len(self.fields) return self.fields[column]

This method makes it easy to extract a single field's value while having the assertion that the field's column is within range.

def asRecord(self): record = [] branch = self.parent while branch is not None:

record.insert(0, branch.toString()) branch = branch.parent assert record and not record[0] record = record[1:] return record + self.fields

The notion of a record used by the tree of table model is the concatenation of all the branches from the root to the leaf's parent, plus the leaf itself—in other words, the user's complete choice "path". In terms of the Server Info application, this is the country, state, city, provider, server, and IP address, where the country, state, and city are branches, and each leaf contains three fields: provider, server, and IP address.

To construct a record (a list of fields), we begin with the leaf node's parent branch, and walk up the tree of branches. Each branch's string is prepended to the record list. The root branch has no string, so we remove that item from the list. The list that is returned is the concatenation of all the branch strings plus the leaf's strings.

We have now completed reviewing the nodes. The tree of table model is a QAbstractltemModel subclass and it reimplements many of the methods we would expect, such as data(), headerData(), rowCount(), and columnCount(). In addition, it provides the index(), parent(), and nodeFromIndex() methods which are usually reimplemented for tree models. It also has some extra methods, namely, load() and addRecord(); these are used to load tabular data and convert it into a tree of branches and leaves. We will begin by looking at the initializer, then the methods for loading the data, and then the standard model/view methods.

class TreeOfTableModel(QAbstractltemModel):

def_init_(self, parent=None):

super(TreeOfTableModel, self)._init_(parent)

self.columns = 0 self.root = BranchNode("") self.headers = []

The number of columns depends on the number of columns in the data that is loaded and on the level of nesting requested. There is always one root branch node that contains no text that is used purely as the parent of all the other branches. The headers are the text used as column headers.

def load(self, filename, nesting, separator): assert nesting > 0 self.nesting = nesting self.root = BranchNode("") exception = None fh = None try:

for line in codecs.open(unicode(filename), "rU", "utf8"): if not line: continue self.addRecord(line.split(separator), False) except IOError, e: exception = e finally:

if fh is not None:

for i in range(self.columns):

self.headers.append("Column #%d" % i) if exception is not None: raise exception

The file to be loaded must be a text file with one record per line, with each field separated by the specified separator. The file must be encoded as UTF-8 Unicode (or ASCII, since that is a subset of UTF-8). Blank lines are ignored; any other line is treated as a record and is added to the tree.

Once loading has finished (successfully or not), we call reset() to notify any views that the model has dramatically changed, and create some initial column headers. If the load failed, we then reraise the exception for the caller to handle. The columns variable is set to 0 in the initializer, and to a meaningful value in addRecord().

def addRecord(self, fields, callReset=True): assert len(fields) > self.nesting root = self.root branch = None for i in range(self.nesting): key = fields[i].lower() branch = root.childWithKey(key) if branch is not None:

root = branch else:

branch = BranchNode(fields[i]) root.insertChild(branch) root = branch assert branch is not None items = fields[self.nesting:]

self.columns = max(self.columns, len(items)) branch.insertChild(LeafNode(items, branch)) if callReset: self.reset()

To add a record there must be more fields than the level of nesting. The logic we use is similar to what we saw in Chapter 14 when we populated a QTree-Widget's internal model. For each field that is to be a branch we look for an existing branch with the same key. If we find one, we make it the current root branch; otherwise, we create a new branch, insert it as a child of the current root branch, and make the new branch the current root branch. As the loop progresses, we gradually walk down the tree, creating any branches that are needed, until we reach the lowest branch.

Once the loop has gone over all the branches that are necessary, creating any that did not previously exist, we can create a list of the non-nesting fields and add them as a child leaf node of the current (lowest-level) branch.

To put things in concrete terms, using the Server Info application as an example, here's what happens. When the first record is read we have a new country, new state, new city, new provider, and so on, so no suitable branches will exist. First a country branch will be created, then a state branch, and then a city branch, and finally a leaf containing the remaining provider, server, and IP address fields. If the next record read is for the same country, but for a new state, it will find the existing country node and use it as the parent node for the new state. Similarly, if a record has a country and state for which branches have already been created, these will be used. But whenever a new branch is needed the code in the loop's body will create it.

When new records are added on an ad hoc basis, we call reset() to notify any views that a significant change has taken place; but when loading from a file we pass False and call reset() in the calling code once all the records have been read.

def asRecord(self, index):

leaf = self.nodeFromIndex(index)

if leaf is not None and isinstance(leaf, LeafNode):

return leaf.asRecord() return []

This method provides a list of the user's chosen "path". It makes sense only for leaf nodes, since only a leaf node can represent a complete path. Returning None for nonleaf nodes would have been an equally good design choice. Notice that we use the nodeFromIndex() method to retrieve the node for a given model index: We will discuss how this works shortly.

def rowCount(self, parent):

node = self.nodeFromIndex(parent)

if node is None or isinstance(node, LeafNode):

return 0 return len(node)

For tree models the row count is the number of children that a particular node has. Our implementation allows only branch nodes to have children, so when called on leaf nodes we always return 0. The len() function calls Branch-Node._len_(), which returns the count of the branch's children.

def columnCount(self, parent): return self.columns

The number of columns is the maximum number of non-nested fields. This may appear to be one too few, but it is correct because the first non-nested field is shown in the first (tree) column.

def data(self, index, role):

if role == Qt.TextAlignmentRole:

return QVariant(int(Qt.AlignTop|Qt.AlignLeft)) if role != Qt.DisplayRole:

return QVariant() node = self.nodeFromIndex(index) assert node is not None if isinstance(node, BranchNode):

return QVariant(node.toString()) \

if index.column() == 0 else QVariant(QString("")) return QVariant(node.field(index.column()))

If the display data is requested for a branch node, we return the node's text for column 0 and an empty string for the other columns. For a leaf node, we return the field that corresponds to the requested column.

Prior to Qt 4.2, the default text alignment worked fine and did not need to be specified, but from Qt 4.2, we must explicitly return a sensible text alignment ourselves.

def headerData(self, section, orientation, role): if orientation == Qt.Horizontal and \ role == Qt.DisplayRole: assert 0 <= section <= len(self.headers) return QVariant(self.headers[section]) return QVariant()

Tree views have only horizontal (column) headers. They don't have row headers (e.g., row numbers), because these don't really make sense since each branch has its own 0-based list of children (rows).

def index(self, row, column, parent): assert self.root branch = self.nodeFromIndex(parent) assert branch is not None return self.createIndex(row, column, branch.childAtRow(row))

The index() method must return the model index for the data item with the given row and column and that is a child of the given parent. In a branches and leaves tree model, this means that we must return the model index of the parent item's row-th child.

We begin by finding the branch node of the given parent model index, and return a model index with the given row and column, and with a parent that is the (branch) node's row-th child node.

def parent(self, child):

node = self.nodeFromIndex(child) if node is None:

return QModelIndex() parent = node.parent if parent is None:

return QModelIndex() grandparent = parent.parent if grandparent is None: return QModelIndex() row = grandparent.rowOfChild(parent) assert row != -1

return self.createIndex(row, 0, parent)

The parent() method must return the model index of the given child's parent. In a branches and leaves tree model, this is the child's grandparent's row-th child.

We start by finding the child node's parent node's parent (that is, the child's grandparent). Then we return a model index that has the row the parent node occupies in the grandparent's list of children, column 0 (since all parents are branches and branches have only a zero-th column), and a parent that is the child's parent.

The reimplementations of the index() and parent() methods shown here are rather subtle. However, they are standard for tree models that take a branch and leaf approach, so their code can simply be copied "as is" in most cases.

def nodeFromIndex(self, index):

return index.internalPointer() \

if index.isValid() else self.root

When we call QAbstractItemModel.createIndex(), the third argument is a reference to a node. This reference is available from a model index and is returned by the internalPointer() method. For any given model index we return a branch or leaf node, or the branch root node.

Understanding tree models is more challenging than understanding table models (or list models, which are just tables with a single column). However, in many cases the difficulties can be reduced by building upon or adapting the code presented in this section.

Was this article helpful?

0 -1
Video Traffic Guru

Video Traffic Guru

You Can Drive THOUSANDS of Hungry Buyers to Your Offer. Over the last few months I've seen hundreds of video marketers struggle to make good money. Even though they put out video after video, they just aren't getting the kind of passive income they'd always wanted.

Get My Free Ebook


Post a comment