XML-Based

Arguments abound for and against XML in the arena of data representation. XML is suited extremely well to Jabber, which is suited extremely well to XML. This is for many reasons.

The alternatives for representing data in Jabber are: binary and ASCII text. Binary? Well, perhaps binary data is more space efficient, but where is that advantage in the general scheme of things these days? Near the bottom of my list, anyway, especially as it's always at the cost of readability. ASCII? Well, yes, of course, ASCII is human readable, but because Jabber data flow consists of a series of conversational chunks—independent constructions in their own right—we need some sort of boundary mechanism to separate these chunks. XML affords us a very nice way of packaging individual chunks of data and giving their content meaning and context: These individual chunks of information have structure too, and this structure doesn't require any fixed-length madness either; XML allows the chunks, or fragments, to bend and stretch as required, while still retaining their meaning.

This flexibility also comes in the form of extensibility. It's straightforward to add distinct "extensions" to a fragment in a way that does not compromise the integrity of that fragment, and provides a structure to the extension added.

So why reinvent the wheel when there are tools that can be taken off the shelf to parse the data? There are many tried and tested XML libraries out there, and to be able to receive (from the parser) the Jabber data in a native format of your choice is a definite advantage.

Some of these arguments, concerning XML fragments and extensibility, will become clearer in Chapter 5. Until then, consider that Jabber makes good use of an XML feature called namespaces. [1] Namespaces are used in XML to segregate, or qualify, individual chunks of data, giving tags a reference to which they belong. What the reference is, in many ways is of secondary importance - the point is the delineation that allows us to manage content within an XML fragment that is logically divided into sub-fragments. Consider Example 2-1, which shows a section of the imaginary conversation from Chapter 1.

Example 2-1. Qualifying a fragment extension with a namespace

<message type='chat' from='jim@company-a.com/home'
    to='john@company-b.com/Desk'>
  <thread>01</thread>
  <body>Here's the link</body>
  <x xmlns='jabber:x:oob'>
    <url>http://www.megacorp.co.uk/earnings3q.html</url>
    <desc>Third Quarter Earnings for Megacorp</desc>
  </x>
</message>

The main part of the fragment is the <message/> element containing the <thread/> and <body/> tags. The <message/> element is the "carrier" part of the fragment:

<message type='chat' from='jim@company-a.com/home'
    to='john@company-b.com/Desk'>
  <thread>01</thread>
  <body>Here's the link</body>
</message>

But the fragment has been embellished by an extension that is qualified by the 'jabber:x:oob' namespace:

<x xmlns='jabber:x:oob'>
  <url>http://www.megacorp.co.uk/earnings3q.html</url>
  <desc>Third Quarter Earnings for Megacorp</desc>
</x>

The xmlns attribute of the <x/> tag declares that the tag, and any children of that tag, belong to, or are qualified by, the jabber:x:oob namespace. [2] This namespace is different to the namespace that qualifies the carrier <message/> tag, and the other elements <presence/> and <iq/> that appear at the same level. The namespace that qualifies these tags is not explicitly specified as an xmlns attribute; rather it is declared when the XML stream is established. It is over this XML stream that these elements flow. See Chapter 5 for more details on XML streams and namespaces.

The general point is that the jabber:x:oob qualified extension is recognizable as an extension (by us, and more importantly, by the XML parser) and can be dealt with appropriately—we are likely to want to handle the information contained in the extension separately from the rest of the message.

So Jabber uses the extensible XML format to contain and carry data between endpoints.

"XML between endpoints"? That sounds rather generic to me—not something that's limited to providing an IM experience. Indeed, that's the whole idea.

"XML Router" is a moniker often used to describe Jabber, by people who have made this logical leap. Remove the IM mantle, and underneath we find a system, an architecture, capable of being deployed to exchange and distribute all manner of XML-encoded data.

Notes

[1]

http://www.w3.org/TR/REC-xml-names/

[2]

This namespace is used in Jabber to carry information about "out of band" (OOB) data; data that moves outside of the main client-server-client pathways. When a client sends a file directly to another client without sending that file via the server, this is said to be "out of band".