XML Solutions in Python

• The xmllib Module

• Parsing Using Expat

• Parsing Using SAX

• Parsing Using DOM

Python's XML support is probably one of the most complex of the different solutions available, largely because of the way in which the different XML parsers have been developed. The original XML parsing system provided with Python 1.5.2 is called xmllib, and it comes as standard with all Python distributions. xmllib was developed on the same basis as the sgm-llib module, which provides SGML parsing tools.

The xmllib parser is both a simple validation parser and an event-driven data parser that provides the base methods for you to use to parse an XML document. To use it, you need to create a new class that inherits from the xmllib module, providing the necessary methods to trap start and end tags, data sections, and entities.

Python 2.0 introduced a completely new hierarchy of modules and packages for developing with XML. The base xml package now includes xml.dom for processing using the DOM, xml.sax for providing an event-driven parser, and xml.parsers.expat for an interface to the generic Expat parser used by many other languages. In addition, the xmllib module is still available as part of the standard Python library, but its use and support have been deprecated in favor of the superior xml.sax package.

We'll be having a look at each of these systems briefly before we take a closer look at specific solutions in later chapters in this part of the book.

Was this article helpful?

+1 0

Post a comment