By now, you should already know that Jabber relies heavily on XML. XML courses through Jabber's veins; data sent and received between entities, and internally within the server itself, is formatted in XML packets.
However, the XML philosophy goes further than this. A connection between two Jabber endpoints, say, a client and a server, is made via a TCP socket, and XML is transferred between these endpoints. However, it's not just random fragments of XML flowing back and forth. There is a structure, a choreography, imposed upon that flow. The entire conversation that takes place between these two endpoints is embodied in a pair of XML documents.
The conversation is two-way, duplexed across a socket connection. One one side, the client sends an XML document to the server. On the other side, the server responds by sending an XML document to the client. Figure 5-3 shows the pair of XML documents being streamed across the TCP socket connection between client and server, over time.
Figure 5-3. Client<->Server conversation as a pair of streamed XML documents
+--------+ +--------+
| | | |
| | TCP socket | |
| |=========////==================================| |
| | client sends XML document to server ---> | |
| |---------////----------------------------------| |
| |</root>..////...<child2/>..<child1/>..<root> ->| |
| |---------////-----------------------------doc1-| |
| client | | server |
| |---------------------------------////----------| |
| |<- <root>..<child1/>..<child2/>..////...</root>| |
| |-doc2----------------------------////----------| |
| | <--- server sends XML document to client | |
| |=================================////==========| |
| | | |
| | | |
+--------+ +--------+
Key:
=====================
TCP socket connection
=====================
//// - gap in time
---------------------
streamed XML document
---------------------
But what do we mean when we say that the conversation is an XML document? To answer this, consider this simple XML document:
<?xml version="1.0"?> <roottag> <fragment1/> <fragment2/> <fragment3/> ... <fragmentN/> </roottag>
The document starts with a document type declaration:
<?xml version="1.0"?>
which is immediately followed by the opening root tag. This root tag is significant because there can be only one (and, of course, its corresponding closing tag) in the whole document. In effect, it wraps and contextualizes the content of the document.
<roottag> ... </roottag>
The real content of the document is made up of the XML fragments that come after the opening root tag:
<fragment1/> <fragment2/> <fragment3/> ... <fragmentN/>
So, taking a connection between a Jabber client and a Jabber server as an example, this is exactly what we have. The server is listening on port 5222 for incoming client-initiated connections. Once a client has successfully connected to the Jabber server, it sends an XML document type declaration and the opening root tag to announce its intentions to the server, which in turn responds by sending an XML document type declaration and opening root tag of its own.
From then on, every subsequent piece of data that the client sends to the server over the lifetime of the connection is an XML fragment (<fragmentN/>). The connection can be closed by the client by sending the matching closing root tag. Of course, the connection can be also closed by the server by sending the closing root tag of its XML document.
The fragments sent within the body of the XML document are the XML building blocks on which Jabber solutions are based. These XML building blocks are introduced and examined in the next section.
Suffice it to say here that these fragments can come in any order within the body of the XML document, precisely because they're in the body. As long as an XML document has a root tag, and the fragments themselves are well-defined, then it doesn't matter what the content is. Because of the way the document is parsed—in chunks, as it appears—it doesn't matter if the fragments appear over a long period, which is the case in a client/server connection where messages and data are passed back and forth over time.
It should be fairly easy now to guess why this section (and the technique) is called “XML Streams”. XML is streamed over a connection in the form of a document, and is parsed and acted upon by the recipient in fragments, as they appear.
Earlier, we said that the opening document tag was used by the client to “announce its intentions.” The following is a typical opening document tag from a Jabber client which has made a socket connection to port 5222 on the Jabber server jabber.org.
<stream:stream
xmlns:stream="http://etherx.jabber.org/streams"
to="jabber.org"
xmlns="jabber:client">There are four parts to this opening tag.
<stream:stream>
Every streaming Jabber XML document must start, and end, with a tag named stream, qualified with the stream namespace.
xmlns:stream="http://etherx.jabber.org/streams"
The declaration of the stream namespace also comes in the opening stream tag. It refers to a URL (http://etherx.jabber.org/streams) which is a fixed value, and serves to uniquely identify the stream namespace used in the XML document, rooted with <stream/>, that is streamed over a Jabber connection.
The namespace qualifies only the tags that are prefixed "stream:". Apart from stream, there is one other tag name used in these documents that is qualified by this namespace, and that is error. The <stream:error/> tag is used to convey Jabber XML stream connection errors, such as premature disconnection, invalid namespace specifications, incomplete root tag definitions, a timeout while waiting for authentication to follow the root tag exchange, and so on.
to="jabber.org"
There is a to attribute that specifies to which Jabber server the connection is to be made, and where the user session is to be started and maintained.
We've already specified the jabber.org hostname, representing our Jabber server, when defining the socket connection (jabber.org:5222), so why do we need to define it again here? As indicated by the to attribute, you can see that we've made a physical connection to the jabber.org host. However, there may be a choice of logical hosts running within the Jabber server to which our client could connect. When making the physical connection from our client to the Jabber server, we defined the hostname jabber.org for our socket connection (to jabber.org:5222). Now we're connected, we're specifying jabber.org again as the logical host to which we want to connect inside Jabber. This is the logical host identity within the Jabber server running on the jabber.org host.
This "repeat specification" is required, because there's a difference between a physical Jabber host, and a logical Jabber host. In the section called A Tour of jabber.xml in Chapter 4 we see how a single Jabber server can be set up to service user sessions (with one or more JSMs) that are each identified with different logical hostnames. This is where the physical/logical hostname distinction comes from, and why it's necessary to specify a name in the root <stream:stream> tag's to attribute.
It just so happens that in the example of an opening tag that we've used, that the logical hostname is the same as the physical one—jabber.org. In many cases, this will be the most commonplace. However, an Internet Service Provider (ISP), for example, may wish to offer Jabber services to its customers and dedicate a single host for that purpose. That host has various DNS names, which all resolve to that same host IP address. Only one Jabber server is run on that host. (If a second server were to be installed, it would have to listen on different—non-standard— ports, which would be less than ideal.) To reflect the different names under which it would want to offer Jabber services, it would run multiple JSMs under different logical names (using different values for each <host/> configuration tag, as explained in the section called A Tour of jabber.xml in Chapter 4). When connecting to that Jabber server, it may well be that the logical name specified in the opening tag's to attribute would be different to the physical name used to reach the host in the first place.
xmlns="jabber:client"
In addition to the namespace that qualifies the stream and error tag names—which could be seen as representing the "outer shell" of the document, the xmlns attribute specifies a namespace which will qualify the tags in the body of the document; the conversation fragments of XML which will appear over time. This namespace is jabber:client and signifies that the type of conversation that is about to ensue over this document connection is a client (to server) conversation.
This namespace specification is required because a client connection is just one type of connection that can be made with a Jabber server, and different connections carry conversations with different content. Table 5-1 lists the conversation namespaces currently defined in the Jabber protocol.
Table 5-2. Conversation Namespaces
| Namespace | Description |
|---|---|
| jabber:client | This is the namespace that qualifies a connection between a Jabber client and a Jabber server. |
| jabber:server | This namespace qualifies a connection between two Jabber servers. Dialback (host verification mechanism) conversations take place within the jabber:server namespace. |
| jabber:component:accept | When an external program connects to a Jabber server via a TCP sockets connection, this namespace is used to qualify the pair of XML documents exchanged over the connection. |
| jabber:component:exec | When an external program connects to a Jabber server via a STDIO connection, this namespace is used to qualify the pair of XML documents exchanged over such the connection. [a] |
| Notes: a. For more details on external program connections to Jabber, see Chapter 4. | |
To complete our initial look at XML streams in a Jabber client-server conversation, let's have a look at what the Jabber server might send in response to the opening tag from the client:
<stream:stream
xmlns:stream='http://etherx.jabber.org/streams'
id='3AFD6862'
xmlns='jabber:client'
from='jabber.org'>There are a couple of differences between this opening tag from the server and the opening tag from the client. That is, above and beyond the fact that this response's opening tag is for a document that is going to be streamed along the socket in the opposite direction to that of document to which the the request's opening tag belongs. The first difference is that there's a from attribute instead of a to attribute. The second difference is that there's an extra attribute—id. Let's look at these in turn.
from="jabber.org"
The from attribute is fairly straightforward; it normally serves to confirm to the client that the requested logical host is available. If the host is available, the value of the from attribute from the server will match the value of the to attribute from the client. However, in some circumstances the value can be different. The value sent in the from attribute is a redirection, or respecification, of the logical host by which the Jabber server (or more specifically the JSM component within the Jabber server) is actually known.
Logical host aliases can be defined in the Jabber server's configuration to “convert” a hostname specified in the incoming to attribute. The <alias/> tag, which is used to define these logical host aliases, is described in the section called Component Instance: c2s in Chapter 4. But how are these hostname conversions used? Here's an example…
Let's say that you're running a Jabber server on an internal network that doesn't have an available DNS server. The host where the Jabber server runs is called apollo, and its IP address is 192.168.1.4. Some people will connect to the host via the hostname because they have it defined in a local /etc/hosts file; others will connect via the IP address. Normally, the hostname (or IP address) specified in the connection parameters given to a Jabber client will be:
Used to build the socket connection to the Jabber server.
Specified in the to attribute in the opening XML stream to specify the logical host.
If the JSM section of the Jabber server is defined to have a hostname of apollo:
<host><jabberd:cmdline flag='h'>apollo</jabberd:cmdline></host>
then we need to make sure that the Jabber client uses that name when forming any JIDs for that Jabber server (e.g., the JID apollo used as an addressee for an IQ browse request). Having this:
<alias to='apollo'>192.168.1.4</alias>
in our c2s instance configuration would mean that any incoming XML stream header with a value of 192.168.1.4 in the to attribute:
<stream:stream
to="192.168.1.4"
xmlns="jabber:client"
xmlns:stream="http://etherx.jabber.org/streams">would elicit the following response:
<stream:stream
from='apollo'
id='1830EF6A'
xmlns='jabber:client'
xmlns:stream='http://etherx.jabber.org/streams'>which effectively says: “Okay, you requested 192.168.0.4, but please use apollo instead.” The client should use the value confirmed in the from attribute when referring to that Jabber server in all subsequent stream fragments.
Not specifying an <alias/> tag in this example would result in problems for the client. Without any way of checking and converting incoming hostnames, the c2s component will by default simply transfer the value from the to attribute to the from attribute in its stream header reply.
Following this thread to its natural conclusion, it's worth pointing out that if we have an alias specification like this:
<alias to='apollo'/>
then the value of the from attribute in the reply will always be set to apollo regardless of what's specified in the to attribute. This means that the to attribute could be left out of the opening stream tag. Although this serves well to illustrate the point, it is not good practise.
id='3AFD6862'
The id attribute is the ID of the XML stream, and is used in the subsequent authorization steps, which are described in Chapter 6. The value is a random hexadecimal string generated by the server, and is not important per se. What is important is that it's a value that is random, and shared between server and client. The server knows what it is because it generated it, and the client knows what it is because the server sends it in the opening tag of the response.
Now that we know how a conversation with a Jabber server is started, let's try it ourselves. At a stretch, one could say that the simplest Jabber client, just like the simplest HTTP client, or the simplest client that has to interact with any server that employs a text-based protocol over a socket connection, is telnet.
Simply point telnet to Jabber server, specifying port 5222, and send an opening tag. You will receive an opening tag, from the server, in response:
yak:~$ telnet localhost 5222 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. <?xml version='1.0'?> <stream:stream xmlns:stream='http://etherx.jabber.org/streams' to='yak' xmlns='jabber:client'> <?xml version='1.0'?><stream:stream xmlns:stream='http://etherx.jabber.org/streams' id='3AFD839E' xmlns='jabber:client' from='yak'>
(If you haven't got a Jabber server to experiment with, see Chapter 3 on how to set one up.)
Using telnet is a great way to find out more about the way the Jabber protocol works. Perhaps the next thing to do is try out the user registration and authentication steps described in Chapter 6. But watch out—send some invalid XML the server will close the connection on you!