Keyword assistant

Many of the Jabber core and peripheral developers hang out in a conference room called "jdev" hosted by the Conferencing component on the Jabber server running at jabber.org. While a lot of useful information was to be gleaned from listening to what went on in "jdev", it wasn't possible for me to be there all the time. The conversations in "jdev" were logged to web pages, which I used to visit after the fact to try and catch up with things. This was mostly a hopeless task, so I decided to try and improve the situation slightly by looking for keywords and Uniform Resource Locators (URLs) in those logs.

This recipe is an automated version of what I did manually. It is a script that connects to a Jabber server, enters a conference room, and sits quite still, listening to the conversation and letting me know when certain words or phrases were uttered. I've given the script a bit of "intelligence" in that it can be interacted with, and told, while running, to watch for, or stop watching for, words and phrases of our choosing.

The script introduces us to programmatic interaction with the Conferencing component. Before looking at the script, let's have a brief overview of Conferencing in general.

Conferencing

The Conferencing component at jabber.org has the name conference.jabber.org. Details of the component instance configuration for such a Conferencing component can be found in the section called Component Instance: conf in Chapter 4, where we see that the component exists as a shared object library connected with the library load component connection method. This component provides general conferencing facilities, orientated around a conference room and conference user model.

A Jabber user can "enter" (or "join") a conference room and thereby becomes a conference user identified by a nickname that is chosen upon entering that room. Nicknames are generally used in conference rooms to provide a modicum of privacy—it is assumed that by default you don't want to let the other conference room members know your real JID.

The Conferencing component that is currently available for use with the Jabber 1.4.1 server supports two protocols for user and room interaction; a simple one that provides basic features, and a more complex one that provides the basic features plus facilities such as password-protected rooms, and room descriptions: [1]

Groupchat

The Groupchat protocol is the simpler of the two and provides basic functions for entering and exiting conference rooms, and choosing nicknames.

This Groupchat protocol has a nominal protocol version number of 1.0. It is known as the "presence-based" protocol, because the protocol is based upon <presence/> elements that are used for room entry, exit, and nickname determination.

Conference

The Conference protocol offers more advanced features than the Groupchat protocol and makes use of two IQ namespaces—jabber:iq:conference and jabber:iq:browse. It has a nominal protocol version number of 1.4 which reflects the version of the Jabber server with which it is delivered. Sometimes this version number is referred to as 0.4, such as in the downloadable tarball, and in the value returned in response to a "version query" on the component itself, as shown in Example 8-1.

The version number isn't that important. The main thing to note is that the component that is called "conference.so" (see the reference to the shared object library in the section called Component Connection Method in Chapter 4) supports both the Groupchat protocol and the Conference protocol. If you come across a shared object library called groupchat.so, this is the original Conferencing component that was made available with Jabber server version 1.0. This library only supports the Groupchat protocol.

Example 8-1. Querying the Conferencing component's version

SEND: <iq type='get' to='conference.gnu.mine.nu'>
        <query xmlns='jabber:iq:version'/>
      </iq>

RECV: <iq to='dj@gnu.mine.nu/jarl' from='conference.gnu.mine.nu'
        type='result'>
        <query xmlns='jabber:iq:version'>
          <name>conference</name>
          <version>0.4</version>
          <os>Linux 2.4.2-2</os>
        </query>
      </iq>

In this recipe we'll be using the simpler "Groupchat" protocol. It's widely used, and easy to understand. Example 8-2 shows a typical element log from Groupchat-based activity. It shows a user, with the JID qmacro@jabber.com, entering a room called "cellar", hosted on the conference component at conf.merlix.dyndns.org, a room which currently has two other occupants who go by the nicknames "flash" and "roscoe". The elements are from qmacro's perspective.

Example 8-2. The Groupchat protocol in action

qmacro tries to enter the room with the nickname "flash", and fails:

SEND: <presence to='cellar@conf.merlix.dyndns.org/flash'/>

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/flash'
              type='error'>
        <error code='409'>Conflict</error>
      </presence>

He tries again, this time with a different nickname "deejay". Success:

SEND: <presence to='cellar@conf.merlix.dyndns.org/deejay'/>

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/flash'/>

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/roscoe'/>

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/deejay'/>

RECV: <message to='qmacro@jabber.com/jarltk'
             type='groupchat' from='cellar@conf.merlix.dyndns.org'>
        <body>qmacro has become available</body>
      </message>

"roscoe" says hi, and qmacro waves back:

RECV: <message to='qmacro@jabber.com/jarltk'
             from='cellar@conf.merlix.dyndns.org/roscoe'
             type='groupchat' cnu=''>
        <body>hi qmacro</body>
      </message>

SEND: <message to='cellar@conf.merlix.dyndns.org' type='groupchat'>
        <body>/me waves to everyone</body>
      </message>

"flash" sends qmacro a private message:

RECV: <message to='qmacro@jabber.com/jarltk'
             from='cellar@conf.merlix.dyndns.org/flash'
             type='chat'>
        <body>psst - want to come out for a beer or two?</body>
        <thread>jarl1998911094</thread>
      </message>

"roscoe" leaves the room:

RECV: <presence to='qmacro@jabber.com/jarltk' type='unavailable'
              from='cellar@conf.merlix.dyndns.org/roscoe'/>

RECV: <message to='qmacro@jabber.com/jarltk' type='groupchat'
             from='cellar@conf.merlix.dyndns.org'>
        <body>roscoe has left</body>
      </message>

Let's take the elements in Example 8-2 bit by bit.

Failed attempt to enter room:

qmacro makes an attempt to enter the room using the Groupchat protocol. This is done by sending a directed <presence/> element to a particular JID that represents the room and the chosen nickname. This JID is constructed as follows:

[room name]@[conference component]/[nickname]

In this example, the conferencing component is identified with the hostname conf.merlix.dyndns.org. qmacro's choice of nickname is "flash", determining that the JID, to which available presence should be sent, be:

cellar@conf.merlix.dyndns.org/flash

Thus the following element is sent:

SEND: <presence to='cellar@conf.merlix.dyndns.org/flash'/>

The conference component determines that there is already someone present in the room cellar@conf.merlix.dyndns.org with the nickname "flash", and so notifies qmacro by sending back a directed presence with an <error/> tag:

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/flash'
              type='error'>
        <error code='409'>Conflict</error>
      </presence>

Note that the <presence/> element has the type "error", and comes from the artificial JID that we constructed in our room entry attempt. The element is addressed to our real JID, of course—qmacro@jabber.com/jarltk—as otherwise it wouldn't reach us.

The error code 409 and text "Conflict" tells qmacro that the nickname conflicted with one already in the room. This is a standard error code/text pair; Table 5-3 shows a complete set of code/text pairs.

At this stage, qmacro is not yet in the room.

Successful attempt to enter room:

qmacro tries again, this time with a different nickname "deejay": [2]

SEND: <presence to='cellar@conf.merlix.dyndns.org/deejay'/>

This time, there is no conflict—no other user is in the room "cellar" with that nickname—and the conference component registers the entry. It does this by sending qmacro the presence of all the room occupants, including that of himself as "deejay":

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/flash'/>

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/roscoe'/>

RECV: <presence to='qmacro@jabber.com/jarltk'
              from='cellar@conf.merlix.dyndns.org/deejay'/>

These presence elements are also sent to the other room occupants.

Conference component-generated notification:

In addition to the presence elements sent for each room occupant, a general room-wide message noting that someone with the nickname "deejay" just entered the room, is sent out by the component as a type='groupchat' message to all the room occupants. Like all the other people, qmacro receives it:

RECV: <message to='qmacro@jabber.com/jarltk'
            type='groupchat'
            from='cellar@conf.merlix.dyndns.org'>
        <body>qmacro has become available</body>
      </message>

The text "has become available" used in the body of the message is taken directly from the Action Notices definitions, part of the Conferencing component instance configuration described in the section called Custom Configuration in Chapter 4. Note that the identity of the room itself is simply a generic version of the JID that the room occupants use to enter:

cellar@conf.merlix.dyndns.org

Room-wide chat:

Once the user with the nickname "roscoe" sees that qmacro has entered the room (he knows that qmacro sometimes goes by the nickname "deejay" and so recognises him), he sends his greetings, and qmacro waves back.

RECV: <message to='qmacro@jabber.com/jarltk'
             from='cellar@conf.merlix.dyndns.org/roscoe'
             type='groupchat' cnu=''>
        <body>hi qmacro</body>
      </message>

SEND: <message to='cellar@conf.merlix.dyndns.org'
        type='groupchat'>
        <body>/me waves to everyone</body>
      </message>

As with the notification message, each message is a "groupchat" type message. The one received appears to come from cellar@conf.merlix.dyndns.org/roscoe, which is the JID representing the user in the room with the nickname "roscoe". This way, the "roscoe"'s real JID is never sent to qmacro. [3] The one sent is addressed to the room's identity cellar@conf.merlix.dyndns.org, and contains a message that starts with "/me". This is simply a convention that is understood by clients that support conferencing, meant to represent an action and displayed thus:

* deejay waves to everyone

One-on-one chat:

The Conferencing component also supports a one-on-one chat mode, which is just like normal chat mode (where <message/>s with the type "chat" are exchanged) except that the routing goes through the component. The intended recipient of a conference-routed chat message is identified by his room JID. So in this example:

RECV: <message to='qmacro@jabber.com/jarltk'
        from='cellar@conf.merlix.dyndns.org/flash'
        type='chat'>
        <body>psst - want to come out for a beer or two?</body>
        <thread>jarl1998911094</thread>
      </message>

The user with the nickname "flash" actually addressed the chat message to the JID:

cellar@conf.merlix.dyndns.org/deejay

which arrived at the Conferencing component (because of the hostname "conf.merlix.dyndns.org" which caused the <message/> element to be routed there) which then looked up internally who "deejay" really was (qmacro@jabber.com/jarlrk) and sent it on. This way, the recipient of a conference-routed message never discovers the real JID of the sender. In all other ways, the actual <message/> element is like any other <message/> element—in this case it contains a message <body/> and a chat <thread/>. [4]

Leaving the room:

In the same way that room entrance is effected by sending an available presence (remember, a <presence/> element without an explicit type attribute is understood to represent type='available'), leaving a room is achieved by doing the opposite—sending an unavailable presence to the room, which is relayed to all the remaining room occupants:

RECV: <presence to='qmacro@jabber.com/jarltk' type='unavailable'
              from='cellar@conf.merlix.dyndns.org/roscoe'/>

The fact that "roscoe" left the room is conveyed by the unavailable presence packet. This is by and large for the benefit of each user's client, so that the room occupant list can be updated. The component also sends out a notification, in the same way as it sends a notification out when someone joins:

RECV: <message to='qmacro@jabber.com/jarltk' type='groupchat'
             from='cellar@conf.merlix.dyndns.org'>
        <body>roscoe has left</body>
      </message>

Like the join notification, the text for the leave notification ("has left") comes directly from the component instance configuration described in the section called Custom Configuration in Chapter 4.

The script's scope

We're going to write the script in Python, using the JabberPy library. What we want the script to do is this:

While we're going to set the identity of the Jabber server and the conference room into variables in the script, to keep things simple, we'll need to keep track of which users ask our assistant for what words and phrases. We'll use a hash ('dictionary' in Python terms) to do this. Having a look at what this hash will look like during the lifetime of this script will help us to visualize what we're trying to achieve. Example 8-3 shows what the contents of the hash might look like at any given time.

Example 8-3. Typical contents of the Keyword assistant's hash

{
  'dj@gnu.pipetree.com/home':             {
                                            'http:': 1,
                                            'ftp:': 1
                                          },

  'piers@jabber.org/work':                {
                                            'Perl': 1,
                                            'Java': 1,
                                            'SAP R/3': 1
                                          },

  'cellar@conf.merlix.dyndns.org/roscoe': { 
                                            'Hazzard': 1
                                          }
}

We can see from the contents of the hash in Example 8-3 that three people have asked the script to look out for words and phrases. Two of those people—dj and piers—have interacted with the script directly, that is by sending the script a 'normal' (or 'chat') <message/>. The other person, with the conference nickname "roscoe", is in the "cellar" room and has sent the script a message routed through the Conference component in the same way that "flash" sent qmacro a message in Example 8-2 earlier: the JID of the sender belongs to (has the hostname set to) the conference component. Technically there's nothing to distinguish the three JIDs here, it's just that we know from the name that conf.merlix.dyndns.org is the name that identifies such a component.

dj wants to be notified if any Web or FTP urls are mentioned. piers is interested in references to two of his favorite languages and his favorite ERP solution, and roscoe, well...

We said we'd give the script a little bit of intelligence. This was a reference to the ability for users to interact with the script while it runs, rather than have to give the script a static list of words and phrases in a configuration file. dj, piers and the user with the "roscoe" nickname have all done this, sending the script messages with simple keyword commands, such as:

dj: "watch http:"
script: "ok, watching for http:"

dj: "watch gopher:"
script: "ok, watching for gopher:"

dj: "watch ftp:"
script: "ok, watching for ftp:"

dj: "ignore gopher:"
script: "ok, now ignoring gopher:"

...

piers: "list"
script: "watching for: Perl, Java, SAP R/3"

...

roscoe: "stop"
script: "ok, I've stopped watching"

Step by step

Taking the script step by step, the first section is probably familiar if you've seen the previous Python-based scripts in the section called CVS notification in Chapter 7 and the section called Presence-sensitive CVS notification in Chapter 7.

import Jabber, XMLStream
from string import split, join, find
import sys

We bring in all the functions and libraries that we'll be needing. We'll be using the find function from the string library to help us with our searching.

Next, we declare our hash, or dictionary, which will hold a list of the words that the script is to look out for, for each person, as shown in Example 8-3.

keywords = {}

Maintaining the keyword hash

To maintain this hash, we need a couple of subroutines that will add and remove words from a person's individual list. These subroutines will be called when a "command" such as "watch" or "ignore" is recognised, in the callback subroutine that will handle incoming <message/> elements.

Here are those two subroutines:

def addword(jid, word):
    if not keywords.has_key(jid):
        keywords[jid] = {}
    keywords[jid][word] = 1

def delword(jid, word):
    if keywords.has_key(jid):
        if keywords[jid].has_key(word):
            del keywords[jid][word]

To each of them, we pass a string representation of the JID (in jid) of the correspondent giving the command, along with the word or phrase specified (in word). The hash has two levels—the first level is keyed by JID, the second by word or phrase. We use a hash, rather than an array, at the second level simply to make removal of words and phrases easier.

Message callback

Next, we define our callback to handle incoming <message/> elements.

def messageCB(con, msg):

    type = msg.getType()
    if type == None:
        type = 'normal'

As usual, we're expecting our message callback to be passed the Jabber Connection object (in con) and the message object itself (in msg). How this callback is to proceed is determined by the type of message received. We determine the type (taken from the <message/> element's type attribute) and store it in the type variable. Remember that if no type attribute is present, a type of "normal" is assumed. [5]

The two sorts of incoming messages we're expecting this script to receive are those conveying the room's conversation—in "groupchat" type messages—and those over which the commands such as "watch" and "ignore" are carried, which we expect in the form of "normal" or "chat" type messages.

Incoming commands

The first main section of the messageCB handler deals with the incoming commands:

    # deal with interaction
    if type == 'chat' or type == 'normal':
        jid = str(msg.getFrom())

        message = split(msg.getBody(), None, 1);
        reply = ""

        if message[0] == 'watch':
            addword(jid, message[1])
            reply = "ok, watching for " + message[1]
        
        if message[0] == 'ignore':
            delword(jid, message[1])
            reply = "ok, now ignoring " + message[1]

        if message[0] == 'list':
            if keywords.has_key(jid):
                reply = "watching for: " + join(keywords[jid].keys(), ", ")
            else:
                reply = "not watching for any keywords"

        if message[0] == 'stop':
            if keywords.has_key(jid):
                del keywords[jid]
                reply = "ok, I've stopped watching"
           
        if reply:
            con.send(msg.build_reply(reply))


Here's that section one chunk at a time.

If the <message/> element turns out to be of the type in which we're expecting a potential command, we want to determine the JID of the correspondent who sent that message. Calling the getFrom() method will return us a JID object. What we need is the string representation of that, which can be determined by calling the str() function on that JID object:

        jid = str(msg.getFrom())

Then we grab the content of the message by calling the getBody() on the msg object, and split the whole thing on the first bit of whitespace. This should be enough for us to distinguish a command ("watch", "ignore", and so on) from the keywords. After the split, the first element (index 0) in the message array will be the command, and the second element (index 1) will be the word or phrase, if given. At this stage we also declare an empty reply.

        message = split(msg.getBody(), None, 1);
        reply = ""

Now it's time to determine whether what the script was sent made sense as a command:

        if message[0] == 'watch':
            addword(jid, message[1])
            reply = "ok, watching for " + message[1]
        
        if message[0] == 'ignore':
            delword(jid, message[1])
            reply = "ok, now ignoring " + message[1]

        if message[0] == 'list':
            if keywords.has_key(jid):
                reply = "watching for: " + join(keywords[jid].keys(), ", ")
            else:
                reply = "not watching for any keywords"

        if message[0] == 'stop':
            if keywords.has_key(jid):
                del keywords[jid]
                reply = "ok, I've stopped watching"

We go through a series of checks, taking appropriate action for our supported commands:

  • watch (watch for a particular word or phrase)

  • ignore (stop watching for a particular word or phrase)

  • list (list the words and phrases currently being watched)

  • stop (stop watching altogether—remove my list of words and phrases)

The addword() and delword() functions defined earlier are used here, as well as other simpler functions that list:

keywords[jid].keys()

or remove:

del keywords[jid]

the words and phrases for a particular JID.

If there was something recognisable for the script to do, we get it to reply appropriately:

        if reply:
            con.send(msg.build_reply(reply))

The build_reply() function creates a reply out of a message object by setting the to to the value of the original <message/> element's from attribute, and preserving the element type attribute and <thread/> tag, if present. The <body/> of the reply object (which is, after all, just a <message/> element), is set to whatever is passed in the function call; in this case, it's the text in the reply variable.

Word and phrase scanning

Now that we've dealt with incoming commands, we just need another section in the message callback subroutine to scan for the words and phrases. The target texts for this scanning will be the snippets of room conversation, which arrive at the callback in the form of "groupchat" type <message/> elements.

    # scan room talk
    if type == 'groupchat':
        message = msg.getBody()

The message variable holds the string we need to scan, and it's just a case of checking for each of the words or phrases on behalf of each of the users that have asked:

        for jid in keywords.keys():
            for word in keywords[jid].keys():
                if find(message, word) >= 0:
                    con.send(Jabber.Message(jid, word + ": " + message))

If we get a hit, we construct a new Message object, passing the JID of the person for whom the string has matched (in the jid variable), and the notification consisting of the word or phrase that was found (in word) and the context in which it was found (the sentence uttered, in message). The <message/> so constructed, is then sent to that user. By default, the Message constructor specifies no type attribute, so that the user is sent a "normal" message.

Presence callback

Having dealt with the incoming <message/> elements that we're expecting, we turn our attention to <presence/> elements. Most of those we receive in this conference room context will be notifications of people entering and leaving the room that we're going to be in, as shown in Example 8-2. We want to perform housekeeping on our keywords hash so that the entries don't become stale. We also want to deal with the potential nickname conflict problem.

Nickname conflict

We want to check for the possibility of nickname conflict problems which may occur when we enter the room, and our chosen nickname is already taken.

Remembering that a conflict notification will look something like this:

<presence to='qmacro@jabber.com/jarltk'
        from='cellar@conf.merlix.dyndns.org/flash' 
        type='error'>
  <error code='409'>Conflict</error>
</presence>

we test for the receipt of a <presence/> element so formed:

def presenceCB(con, prs):

    # deal with nickname conflict in room
    if str(prs.getFrom()) == roomjid and prs.getType() == 'error':
        prsnode = prs.asNode()
        error = prsnode.getTag('error')
        if error:
          if (error.getAttr('code') == '409'):
              print "cannot join room - conflicting nickname"
              con.disconnect()
              sys.exit(0)

The <presence/> element will appear to be sent from the JID that we constructed for our initial room entry negotiation (in the roomjid variable further down in the script), for example, in our case:

jdev@conference.jabber.org/kassist

We compare this value to the value of the incoming <presence/>'s from attribute, and also make sure that the type attribute is set to "error".

If it is, we want to extract the details from the <error/> tag that will be contained as a direct child of the <presence/>. The JabberPy library doesn't offer a direct high-level function to get at this tag from the Presence object (in prs), but we can strip away the presence object "mantle" and get at the underlying object, which is a neutral "node"—a Jabber element, or XML fragment, without any pre-conceived ideas of what it is (and therefore without any accompanying high-level methods such as getBody() or setPriority(). [6]

The asNode() method gives us what we need - a Protocol object representation of our <presence/> element. From this we can get to the <error/> tag and its contents. If we find that we do have a nickname conflict, we abort by disconnecting from the Jabber server and ending the script.

Keyword housekeeping

The general idea is that this script will run indefinitely and notify the users on a continuous basis. No presence subscription relationships are built (mostly to keep the script small and simple; you could adapt the mechanism from the recipe in the section called Presence-sensitive CVS notification in Chapter 7 if you wanted to make this script sensitive to presence) and so notifications will get queued up for the user if he is offline. [7] This makes a lot of sense for the most part; I still want to have the script send me notifications even if I'm offline. However, consider that the script could be sent a command, to watch for a keyword or phrase, from a user within the room. We would receive the command from a JID like this:

jdev@conference.jabber.org/nickname

This is a "transient" JID, in that it represents a user's presence in the jdev room for a particular session. If a word is spotted by the script hours or days later, there's a good chance that the user has left the room, making the JID invalid as a recipient—although the JID is technically valid and will reach the conferencing component, there will be no real user JID that it is paired up with. Potentially worse, the room occupant's identity JID may be assigned to someone else at a later stage, if the original user left, and a new user entered choosing the same nick as the original user had chosen. the sidebar Transient JIDs and non-existent JIDs discusses the difference between a "transient" JID and a non-existent JID.

So as soon as we notice a user leave the room we're in, which will be indicated through a <presence/> element conveying that occupant's unavailability, we should remove any watched-for words and phrases from our hash:

    # remove keyword list for groupchat correspondent
    if prs.getType() == 'unavailable':
        jid = str(prs.getFrom())
        if keywords.has_key(jid):
            del keywords[jid]

As before, we obtain the string representation of the JID using the str() function on the JID object that represents the presence element's sender, obtained via the getFrom() method.

Example 8-4. A message to a non-existent transient JID is rejected

SEND: <message to='jdev@conference.jabber.org/qmacro'>
        <body>Hello there</body>
      </message>

RECV: <message to='dj@gnu.mine.nu/jarl' 
             from='jdev@conference.jabber.org/qmacro' type='error'>
        <body>Hello there</body>
        <error code='404'>Not Found</error>
      </message>

The main script

Ok. We've got our subroutines and callbacks set up. All that remains is for us to define our Jabber server and room information:

Server   = 'gnu.mine.nu'
Username = 'kassist'
Password = 'pass'
Resource = 'py'

Room     = 'jdev'
ConfServ = 'conference.jabber.org'
Nick     = 'kassist'

The kassist user can be set up simply by using the reguser script presented in the section called User Registration Script in Chapter 6:

$ ./reguser gnu.mine.nu username=kassist password=pass
[Attempt] (kassist) Successful registration
$

In the same way as previous recipes' scripts, a connection attempt, followed by an authentication attempt, is made:

con = Jabber.Connection(host=Server,debug=0,log=0)
try:
    con.connect()
except XMLStream.error, e:
    print "Couldn't connect: %s" % e 
    sys.exit(0)
else:
    print "Connected"

if con.auth(Username,Password,Resource):
    print "Logged in as %s to server %s" % ( Username, Server )
else:
    print "problems authenticating: ", con.lastErr, con.lastErrCode
    sys.exit(1)

Then the message and presence callbacks messageCB() and presenceCB() are defined to our Connection object con:

con.setMessageHandler(messageCB)
con.setPresenceHandler(presenceCB)

After sending initial presence, informing the JSM (and anyone that might be subscribed to kassist's presence) of our availability:

con.send(Jabber.Presence())

we also construct—from the Room, ConfServ, and Nick variables—and send the <presence/> element for negotiating entry to the "jdev" room hosted by the Conferencing component at conference.jabber.org:

roomjid = Room + '@' + ConfServ + '/' + Nick
print "Joining " + Room
con.send(Jabber.Presence(to=roomjid))

con.send() will send a <presence/> element that looks like this:

SEND: <presence to='jdev@conference.jabber.org/kassist'/>

We're sending available presence to the room, to negotiate entry, but what about the initial presence? Why do we send that too, as there are no users that are likely to be subscribed to the kassist JID. Well, if no initial presence is sent, the JSM will merely store up any <message/> elements destined for kassist, as it will think the JID is offline. That won't help at all.

The processing loop

Once everything has been set up: initial presence has been sent and the room has been entered, we simply need to have the script sit back, wait for incoming packets, and handle them appropriately. For this, we simply call the process() repeatedly, waiting up to 5 seconds at a time for elements to arrive on the XML stream:

while(1):
    con.process(5)

The whole script

Here's the Keyword assistant script in its entirety.

import Jabber, XMLStream
from string import split, join, find
import sys

keywords = {}

def addword(jid, word):
    if not keywords.has_key(jid):
        keywords[jid] = {}
    keywords[jid][word] = 1

def delword(jid, word):
    if keywords.has_key(jid):
        if keywords[jid].has_key(word):
            del keywords[jid][word]

def messageCB(con, msg):

    type = msg.getType()
    if type == None:
        type = 'normal'

    # deal with interaction
    if type == 'chat' or type == 'normal':
        jid = str(msg.getFrom())

        message = split(msg.getBody(), None, 1);
        reply = ""

        if message[0] == 'watch':
            addword(jid, message[1])
            reply = "ok, watching for " + message[1]
        
        if message[0] == 'ignore':
            delword(jid, message[1])
            reply = "ok, now ignoring " + message[1]

        if message[0] == 'list':
            if keywords.has_key(jid):
                reply = "watching for: " + join(keywords[jid].keys(), ", ")
            else:
                reply = "not watching for any keywords"

        if message[0] == 'stop':
            if keywords.has_key(jid):
                del keywords[jid]
                reply = "ok, I've stopped watching"
           
        if reply:
            con.send(msg.build_reply(reply))


    # scan room talk
    if type == 'groupchat':
        message = msg.getBody()

        for jid in keywords.keys():
            for word in keywords[jid].keys():
                if find(message, word) >= 0:
                    con.send(Jabber.Message(jid, word + ": " + message))


def presenceCB(con, prs):

    # deal with nickname conflict in room
    if str(prs.getFrom()) == roomjid and prs.getType() == 'error':
        prsnode = prs.asNode()
        error = prsnode.getTag('error')
        if error:
          if (error.getAttr('code') == '409'):
              print "cannot join room - conflicting nickname"
              con.disconnect()
              sys.exit(0)

    # remove keyword list for groupchat correspondent
    if prs.getType() == 'unavailable':
        jid = str(prs.getFrom())
        if keywords.has_key(jid):
            del keywords[jid]

Server   = 'gnu.mine.nu'
Username = 'kassist'
Password = 'pass'
Resource = 'py'

Room     = 'jdev'
ConfServ = 'conference.jabber.org'
Nick     = 'kassist'

con = Jabber.Connection(host=Server,debug=0,log=0)
try:
    con.connect()
except XMLStream.error, e:
    print "Couldn't connect: %s" % e 
    sys.exit(0)
else:
    print "Connected"

if con.auth(Username,Password,Resource):
    print "Logged in as %s to server %s" % ( Username, Server )
else:
    print "problems authenticating: ", con.lastErr, con.lastErrCode
    sys.exit(1)

con.setMessageHandler(messageCB)
con.setPresenceHandler(presenceCB)

con.send(Jabber.Presence())

roomjid = Room + '@' + ConfServ + '/' + Nick
print "Joining " + Room
con.send(Jabber.Presence(to=roomjid))

while(1):
    con.process(5)

Notes

[1]

There is also a third protocol, called "Experimental iq:groupchat", which came inbetween the Groupchat and Conference protocols. This reflected an experimental move to adding features to the basic Groupchat protocol by the use of IQ elements, the contents of which were qualified by a namespace "jabber:iq:groupchat". This protocol has been dropped, and support for it only exists in certain versions of a couple of public Jabber clients - WinJab and JIM.

[2]

There's no rule that says the nickname can't be the same as the user part of your JID, if you're not concerned with hiding your true identity :-)

[3]

Ignore the cnu attribute. It's put there by the component, and should never make it out to the client endpoints. The attribute name is a short name for "conference user", and refers to the internal structure that represents a conference room occupant within the component.

[4]

See the section called The Message Element in Chapter 5 for details on the <message/> element.

[5]

See the section called Message Attributes in Chapter 5 for details of <message/> attributes.

[6]

If this seems a little cryptic, just think of it like this: each of the Presence, Message, and IQ classes are merely superclasses of the base class Protocol, which represents elements generically.

[7]

By the mod_offline module of the Jabber Session Manager (JSM).

[8]

Of course, if the JID referred to a non-existent Jabber server, then the error returned wouldn't be a "Not Found" error 404, but an "Unable to resolve hostname" error 502.