Hierarchical Markup Language

At the core of XML is a simple hierarchical markup language. Tags are used to mark off sections of content with different semantic meanings, and attributes are used to add metadata about the content.

Following is an example of a simple XML document that could be used to describe a library:

<?xml version="1.0"?> <library owner="John Q. Reader"> <book>

<title>Sandman Volume 1: Preludes and Nocturnes</title> <author>Neil Gaiman</author> </book> <book>

<title>Good Omens</title> <author>Neil Gamain</author> <author>Terry Pratchett</author> </book> <book>

<title>"Repent, Harlequin!" Said the Tick-Tock Man</title> <author>Harlan Ellison</author> </book> </library>

Notice that every piece of data is wrapped in a tag and that tags are nested in a hierarchy that contains further information about the data it wraps. Based on the previous document, you can surmise that <author> is a child piece of information for <book>, as is <title>, and that a library has an attribute called owner.

Unlike semantic markup languages like LaTeX, every piece of data in XML must be enclosed in tags. The top-level tag is known as the document root, which encloses everything in the document. An XML document can have only one document root.

Just before the document root is the XML declaration: <?xml version="1.0"?>. This mandatory element lets the processor know that this is an XML document. As of the writing of this book, 1.0 is the only version of XML, so every document will use that version, and this element can just be ignored. If later versions of XML are released, you may need to parse this element to handle the document correctly.

One problem with semantic markup is the possibility for confusion as data changes contexts. For instance, you might want to ship a list of book titles off to a database about authors. However, without a human to look at it, the database has no way of knowing that <title> means a book title, as opposed to an editor's business title or an author's honorific. This is where namespaces come in. A namespace is used to provide a frame of reference for tags and is given a unique ID in the form of a URL, plus a prefix to apply to tags from that namespace. For example, you might create a library namespace, with an identifier of http://server.domain.tld/NameSpaces/Library and with a prefix of lib: and use that to provide a frame of reference for the tags. With a namespace, the document would look like this:

<?xml version="1.0"?> <lib:library owner="John Q. Reader"

xmlns:lib="http://server.domain.tld/NameSpaces/Library"> <lib:book>

<lib:title>Sandman Volume 1: Preludes and Nocturnes</lib:title> <lib:author>Neil Gaiman</lib:author> </lib:book> <lib:book>

<lib:title>Good Omens</lib:title>

<lib:author>Neil Gamain</lib:author> <lib:author>Terry Pratchett</lib:author> </lib:book> <lib:book>

<lib:title>"Repent, Harlequin!" Said the Tick-Tock Man</lib:title> <lib:author>Harlan Ellison</lib:author> </lib:book> </lib:library>

It's now explicit that the title element comes from a set of elements defined by a library namespace, and can be treated accordingly.

A namespace declaration can be added to any node in a document, and that namespace will be available to every descendant node of that node. In most documents, all namespace declarations are applied to the root element of the document, even if the namespace isn't used until deeper in the document. In this case, the namespace is applied to every tag in the document, so the namespace declaration must be on the root element.

A document can have and use multiple namespaces. For instance, the preceding example library might use one namespace for library information and a second one to add publisher information.

Notice the xmlns: prefix for the namespace declaration. Certain namespace prefixes are reserved for use by XML and its associated languages, such as xml:, xsl:, and xmlns:. A namespace declaration can be added to any node in a document, and that namespace will be available to every descendant node of that node.

This is a fairly simple document. A more complex document might contain CDATA sections for storing unprocessed data, comments, and processing instructions for storing information specific to a single XML processor. For more thorough coverage of the subject, you may want to visit http:// w3cschools.org or pick up Wrox Press's Beginning XML, 3rd Edition (0764570773) by David Hunter et al.

Was this article helpful?

0 0

Post a comment