Base Character Sets

Although there are several different character sets supported by XML, Unicode, and the different platforms, many are actually based on a few standard sets. The oldest standard is ASCII, which is the basis of most of the character sets supported by XML. In addition, most computers use the ISO-8859-1 (or Latin-1) character set to provide extended characters for more Western European languages. Nearly all the ISO-8859 character sets are modifications of the ISO-8859-1 standard. All the ISO-8859...

Building a Simple XML Parser

As you already know, the XML parser available within PHP is based on the Expat library. The Expat library uses callback functions that are executed when the different entities in the document are identified. A number of different entities make up an XML document, but the primary components that all XML parsers are capable of handling are the start tag (such as < data> ), the end tag (< data> ), and character data (any non-tagged element). The full process for building an XML parser...

Combining DTML and XML Resources

Zope itself doesn't understand XML (except when importing a previous Zope object export in XML format), but that doesn't mean that you can't work with XML and other formats. For example, you can use the built-in features of DTML and Zope to export a DTML resource in XML format. For this, you first need to have a Zope project to work with for our examples in this entire chapter, you'll be working with a very simple logging project that allows you to enter a title and message, which is logged...

Inside SAX Processing

The XML Parser PerlSAX module is a PerlSAX-compliant system for XML processing. As you saw in Chapter 6, XML Solutions in Perl, the PerlSAX system works through a class-based system in which you produce a handler that deals with the different components of an XML document as it is processed. As with most other event-based systems, the class is used to define the methods that will be called when the different XML element components are identified. The method itself supports only three methods...

Inside the XML Parser

At the risk of repeating myself, the XML parser built into PHP is based on Expat libraries. The standard PHP 4.x distributions now include the source for Expat and the extensions for PHP itself to handle the XML processing, and XML should be enabled by default when you configure and build the system. Information sent to the parser is handled entirely by the parser and the functions that you create to handle the different elements. There's no way to interrupt the flow of parsing and execute...

Marking an Item Completed

To mark an item as completed, you need to update the status tag character data with Done instead of the default Open. The basics are identical to adding a new item you find the entry you are looking for (with the item id attribute) and set its text, then you repeat the XML dumping as text procedure to write the new document before printing the to-do list summary again. The full script is shown in Listing 21.3. O Listing 21.3 Crossing an Item Off the List require Mrexm1 documentM require...

O Listing An XMLtoHTML Converter

The elements hash holds the configuration information for the XML tags found by the parser. The tags output are HTML. Because an individual XML tag can generate multiple HTML tags, the base key links to a list Within the list are individual hash references for each HTML tag, and the hash contains the tag and attribute For example, a < title> XML tag produces < tr> < td bgcolor 000094n align left> < font face Arial, Helvetica color ffffffXb> 'video' > , 'title' > tag >...

Parsing Quick Reference

The XML Tools parser, you may not be surprised to discover, is based on the Expat parser that is the basis of so many other parsing interfaces we've covered in this book. Because of this, the parser is completely Unicode compliant and should process Unicode text in most of the standard forms without any problems. The parse XML statement accepts a number of different parameters that control the parsing process. A list of these parameters is shown in Table 23.1. All the parameters are optional,...

PHP and XMLRPC

XML-RPC is a technology that uses an XML document transferred over a transport (usually HTTP) that requests a remote function on a machine to be executed. The function called can be any supported function and you can supply arguments to it just as you would a local function. Also, just like a local function, the results of the function call are sent back over the transport link back to the caller. XML-RPC is useful in those situations when you want to execute a piece of code on a remote machine...

Processing an RSS Feed to HTML

We can use the XML parser in combination with the URL Access Scripting extension (a standard part of the OS) to download and parse RDF RSS news summary files from websites into an HTML document. We can use that document in Internet Explorer or Netscape Navigator to browse and link to the stories. For our example, we'll use the Macintosh News Network site (http macnn.com), which publishes an RSS feed at http www.macnn.com macnn.rdf. A fragment of the resulting RSS feed can be seen in Listing...

Processing XML as Markup

The easiest way around this limitation is to tell REBOL to load the XML document in markup mode, using an option to the load function to tell it to parse the XML tags and character data into separate blocks. You can demonstrate this quite easily using the following script xmlsource load markup simple.xml probe xmlsource When used on the following XML file < simple> < title> Some Other Title< title> < paragraphs> < paragraph refid Mp1M> Some text< paragraph> < paragraph...

The XML Export Format

Despite its heavy web service and integration focus, Zope doesn't actually include the built-in capability to parse and process XML documents. That doesn't mean that it's totally ignorant of XML. Once you have created a folder or collection on Zope, you can export the folder object into an export file. The normal format for this is a binary Zope export format that uses the Python pickle and cPickle modules to dump Zope objects out to the file. This export format in Zope is exceedingly useful...

The XML Tools Dictionary

As with most AppleScript extensions, you can get more information and examples about the XML Tools by opening the XML Tools dictionary in your AppleScript editor. For example, using the standard Script Editor application (in the AppleScript directory of the Apple Extras directory on your hard disk), select File Open Dictionary and find the XML Tools extension. You should get a window like the one shown in Figure 23.4. Late Night Software also provides an XML-RPC extension that enables you to...

Using the HTML Builder Class with DOM

Working with DOM is a matter of accessing the tags that you want to use, either by referencing them directly or by walking through the structure of nodes and their children to extract the information that you want from the document. In the case of the client bank accounts XML file, you know that there are four different areas to the XML document. These are the main client name, the list of accounts, the information for each account, and the list of transactions for a given account. You can...

XML Character Set Names

Unicode itself really only supports two different character sets UTF-8 and UTF-16. In addition, a number of different existing character sets ratified by the ISO also exist and are supported by most XML parsers. The exact list of character sets supported depends on your parser, but all should support the basic Unicode sets as well as the ISO-8859-1 set. UTF-8 and ISO-8859-1 match the ASCII set for the first 127 characters. In essence, if you need access to a full range of characters from all...

XML Solutions in Python

Python's XML support is probably one of the most complex of the different solutions available, largely because of the way in which the different XML parsers have been developed. The original XML parsing system provided with Python 1.5.2 is called xmllib, and it comes as standard with all Python distributions. xmllib was developed on the same basis as the sgm-llib module, which provides SGML parsing tools. The xmllib parser is both a simple validation parser and an event-driven data parser that...

Using xmlproc for Validation

When working with any kind of XML document, there are always issues relating to the validity of the document being parsed. Most of the parsers will check a document for well-formedness, including SAX, which is used both directly and for building the DOM object model for the Python DOM implementation, and Expat. This involves simply checking that start tags have corresponding end tags and that tags don't overlap each other (start and end tags in different orders). These basic checks are...

Tcl and Unicode

The TclXML parser will read Unicode encoded documents directly, so you need to identify or display either entities or character data. Then you will need to be able to translate between Unicode formats. Tcl 8.1 and after includes the encoding command, which will convert strings between the different encoding formats for you. See the following sidebar for information on determining which encodings are supported by your system. To determine which encodings are supported by your Tcl installation,...