Troubleshooting OSAD and jabberd

Open File Count Exceeded

The number of maximum files that a jabber user can open is lower than the number of connected clients. OSAD clients cannot contact the SUSE Manager Server, and jabberd requires long periods of time to respond on port 5222. Each client requires one permanently open TCP connection and each connection requires one file handler.

To resolve OSAD connection problems, edit such lines in /etc/security/limits.conf:

jabber   soft   nofile   <#clients + 100>
jabber   hard   nofile   <#clients + 1000>

Itwill vary according to your setup. For example, in the case of 5000 clients:

jabber   soft   nofile   5100
jabber   hard   nofile   6000

Ensure you also update the max_fds parameter in /etc/jabberd/c2s.xml. For example: <max_fds>6000</max_fds>

The soft file limit is the limit of the maximum number of open files for a single process. In Uyuni the highest consuming process is c2s, which opens a connection per client. 100 additional files are added, here, to accommodate for any non-connection file that c2s requires to work correctly. The hard limit applies to all processes belonging to the jabber user, and accounts for open files from the router, s2s and sm processes additionally.

jabberd Database Corruption

SYMPTOMS: After a disk is full error or a disk crash event, the jabberd database may have become corrupted. jabberd may then fail starting Spacewalk services:

Starting spacewalk services...
   Initializing jabberd processes...
       Starting router                                                                   done
       Starting sm startproc:  exit status of parent of /usr/bin/sm: 2                   failed
   Terminating jabberd processes...

/var/log/messages shows more details:

jabberd/sm[31445]: starting up
jabberd/sm[31445]: process id is 31445, written to /var/lib/jabberd/pid/sm.pid
jabberd/sm[31445]: loading 'db' storage module
jabberd/sm[31445]: db: corruption detected! close all jabberd processes and run db_recover
jabberd/router[31437]: shutting down

CURE: Remove the jabberd database and restart. jabberd will automatically re-create the database. Enter at the command prompt:

spacewalk-service stop
rm -rf /var/lib/jabberd/db/*
spacewalk-service start

Capturing XMPP Network Data for Debugging Purposes

If you are experiencing bugs regarding OSAD, it can be useful to dump network messages in order to help with debugging. The following procedures provide information on capturing data from both the client and server side.

Procedure: Server Side Capture
  1. Install the tcpdump package on the server as root:

    zypper in tcpdump
  2. Stop the OSA dispatcher and Jabber processes:

    rcosa-dispatcher stop
    rcjabberd stop
  3. Start data capture on port 5222:

    tcpdump -s 0 port 5222 -w server_dump.pcap
  4. Open a second terminal and start the OSA dispatcher and Jabber processes:

    rcosa-dispatcher start
    rcjabberd start
  5. Operate the server and clients so the bug you formerly experienced is reproduced.

  6. When you have finished your capture re-open the first terminal and stop the data capture with CTRL+c.

Procedure: Client Side Capture
  1. Install the tcpdump package on your client as root:

    zypper in tcpdump
  2. Stop the OSA process:

    rcosad stop
  3. Begin data capture on port 5222:

    tcpdump -s 0 port 5222 -w client_client_dump.pcap
  4. Open a second terminal and start the OSA process:

    rcosad start
  5. Operate the server and clients so the bug you formerly experienced is reproduced.

  6. When you have finished your capture re-open the first terminal and stop the data capture with CTRL+c.