Troubleshooting OSAD and jabberd

Open File Count Exceeded

SYMPTOMS: OSAD clients cannot contact the SUSE Manager Server, and jabberd requires long periods of time to respond on port 5222.

CAUSE: The number of maximum files that a jabber user can open is lower than the number of connected clients. Each client requires one permanently open TCP connection and each connection requires one file handler. The result is jabberd begins to queue and refuse connections.

CURE: Edit the /etc/security/limits.conf to something similar to the following: jabbersoftnofile<#clients + 100> jabberhardnofile<#clients + 1000>

This will vary according to your setup. For example in the case of 5000 clients: jabbersoftnofile5100 jabberhardnofile6000

Ensure you update the /etc/jabberd/c2s.xml max_fds parameter as well. For example: <max_fds>6000</max_fds>

EXPLANATION: The soft file limit is the limit of the maximum number of open files for a single process. In SUSE Manager the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s requires to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes additionally.

jabberd Database Corruption

SYMPTOMS: After a disk is full error or a disk crash event, the jabberd database may have become corrupted. jabberd may then fail to start during spacewalk-service start:

Starting spacewalk services...
   Initializing jabberd processes...
       Starting router                                                                   done
       Starting sm startproc:  exit status of parent of /usr/bin/sm: 2                   failed
   Terminating jabberd processes...

/var/log/messages shows more details:

jabberd/sm[31445]: starting up
jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid
jabberd/sm[31445]: loading 'db' storage module
jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover
jabberd/router[31437]: shutting down

CURE: Remove the jabberd database and restart. Jabberd will automatically re-create the database:

spacewalk-service stop
 rm -Rf /var/lib/jabberd/db/*
 spacewalk-service start

An alternative approach would be to test another database, but SUSE Manager does not deliver drivers for this:

rcosa-dispatcher stop
 rcjabberd stop
 cd /var/lib/jabberd/db
 rm *
 cp /usr/share/doc/packages/jabberd/db-setup.sqlite .
 sqlite3 sqlite.db < db-setup.sqlite
 chown jabber:jabber *
 rcjabberd start
 rcosa-dispatcher start

Capturing XMPP Network Data for Debugging Purposes

If you are experiencing bugs regarding OSAD, it can be useful to dump network messages in order to help with debugging. The following procedures provide information on capturing data from both the client and server side.

Procedure: Server Side Capture
  1. Install the tcpdump package on the SUSE Manager Server as root: zypper in tcpdump

  2. Stop the OSA dispatcher and Jabber processes with rcosa-dispatcher stop and rcjabberd stop.

  3. Start data capture on port 5222: tcpdump -s 0 port 5222 -w server_dump.pcap

  4. Start the OSA dispatcher and Jabber processes: rcosa-dispatcher start and rcjabberd start.

  5. Open a second terminal and execute the following commands: rcosa-dispatcher start and rcjabberd start.

  6. Operate the SUSE Manager server and clients so the bug you formerly experienced is reproduced.

  7. Once you have finished your capture re-open terminal 1 and stop the capture of data with: CTRL+c

Procedure: Client Side Capture
  1. Install the tcpdump package on your client as root: zypper in tcpdump

  2. Stop the OSA process: rcosad stop.

  3. Begin data capture on port 5222: tcpdump -s 0 port 5222 -w client_client_dump.pcap

  4. Open a second terminal and start the OSA process: rcosad start

  5. Operate the SUSE Manager server and clients so the bug you formerly experienced is reproduced.

  6. Once you have finished your capture re-open terminal 1 and stop the capture of data with: CTRL+c

Engineering Notes: Analyzing Captured Data

This section provides information on analyzing the previously captured data from client and server.

  1. Obtain the certificate file from your SUSE Manager server: /etc/pki/spacewalk/jabberd/server.pem

  2. Edit the certificate file removing all lines before ----BEGIN RSA PRIVATE KEY-----, save it as key.pem

  3. Install Wireshark as root with: zypper in wireshark

  4. Open the captured file in wireshark.

  5. From Eidt  ]menu:Preferences[ select SSL from the left pane.

  6. Select RSA keys list: Edit  ]menu:New[

    • IP Address any

    • Port: 5222

    • Protocol: xmpp

    • Key File: open the key.pem file previously edited.

    • Password: leave blank

    For more information see also: