Parsing XML Files with Python

There are two ways of parsing an XML document. One method is called SAX, or Simple API for XML. Before you process XML with SAX you need to define a callback function for each tag that you are interested in. You then call a SAX method to parse the XML. The parser will then read the XML file one line at a time and call a registered method for each recognized element.

Another method, which I'm going to use in my example, is called the DOM or Document Object Model. Unlike SAX, the DOM parser reads the whole XML document into memory, parses it, and builds an internal representation of that document. By nature, XML documents represent a tree-like structure, with node elements that contain child or branch elements, and so on. So the DOM parser builds a tree-like linked data structure and provides you with methods of traversing through this tree structure.

There are three basic steps in finding the information in an XML document: parse the XML document, find a tree node that contains the elements that interest you, and finally read their values or contents.

Parsing an XML document is really simple and only takes one line of code (two if you count the include statement). The following code reads in the whole configuration file and creates an XML parser object that later can be used to find information.

Was this article helpful?

0 0

Post a comment