How does the HTML parser work

The HTML window has its own internal parser within wxPython. Actually, there are two parser classes, but one of them is a refinement of the other. In general, working with parsers is only useful if you want to extend the functionality of the wx.html.HtmlWindow itself. If you are programming in Python and want to use an HTML parser for other purposes, we recommend using one of the two parser modules that are distributed with Python (htmllib and HTMLParser), or an external Python tool like Beautiful Soup. We're only going to cover this enough to give you the basics needed to add your own tag type.

The two parser classes are wx.html.HtmlParser, which is the more generic parser, and wx.html.HtmlWinParser, which is a subclass of wx.html.HtmlParser, with extensions specifically created to support displaying text in a wx.html.Html-Window. Since we're mostly concerned with HTML windows here, we'll focus on the subclass.

To create an HTML parser, use one of two constructors. The basic one, wx.html.HtmlWinParser(), takes no arguments. The parent wx.html.HtmlParser class also has a no-argument constructor. You can associate a wx.html.HtmlWin-Parser() with an existing wx.html.HtmlWindow using the other constructor— wx.html.HtmlWinParser(wnd), where wnd is the instance of the HTML window.

To use the parser, the simplest way is to call the method Parse(source). The source parameter is the HTML string to be processed. The return value is the parsed data. For a wx.html.HtmlWinParser, the return value is an instance of the class wx.html.HtmlCell.

The HTML parser converts the HTML text into a series of cells, where a cell is some meaningful fragment of the HTML. A cell can represent some text, an image, a table, a list, or any other specific element. The most significant subclass of wx. html.HtmlCell is wx.html.HtmlContainerCell, which is simply a cell that can contain other cells within it, such as a table, or a paragraph with different text styles. For nearly every document that you parse, the return value will be an wx.html.Html-ContainerCell. Each cell contains a Draw(dc, x, y, view_y1, view_y2) method, which allows it to actually draw its information in the HTML window.

Another important cell subclass is wx.html.HtmlWidgetCell, which allows an arbitrary wxPython widget to be inserted into an HTML document just like any other cell. This can include any kind of widget used to manage HTML forms, but can also include static text used for formatted display. The only interesting method of wx.html.HtmlWidgetCell is the constructor.

wx.html.HtmlWidgetCell(wnd, w=0)

In the constructor, the wnd parameter is the wxPython widget to be drawn. The w parameter is a floating width. If it is not 0, it is an integer between 1 and 100, and then the width of the wnd widget is dynamically adjusted to be that percentage of the width of its parent container.

There are many other cell types that are used to display the more typical parts of an HTML document. For more information regarding these other cell types, refer to the wxWidgets documentation.

Was this article helpful?

0 0

Post a comment