[OpenAFS] Prolonged period of blocked connections

Will Maier willmaier@ml1.net
Wed, 4 Feb 2009 15:51:51 -0600


Hi Derrick-

On Wed, Feb 04, 2009 at 04:42:05PM -0500, Derrick Brashear wrote:
> On Wed, Feb 4, 2009 at 4:38 PM, Will Maier <willmaier@ml1.net> wrote:
> > In the past, we've observed prolonged periods where one or more of
> > our servers would report more than 200 calls waiting for a thread.
> > This occurred again this morning and lasted for about four hours.
> 
> bos status (fileserverhost) fs -long
> 
> and post that information?

Here's what I get:

    Instance fs, (type is fs) currently running normally.
        Auxiliary status is: file server running.
        Process last started at Wed Feb  4 12:01:36 2009 (6 proc starts)
        Last exit at Wed Feb  4 12:01:36 2009
        Command 1 is '/usr/afs/bin/fileserver'
        Command 2 is '/usr/afs/bin/volserver'
        Command 3 is '/usr/afs/bin/salvager'

Here's what top says currently:

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
     3084 root      11  -5  204m  17m 1216 S  174  0.2 299:30.71 fileserver

And, for good measure, here's what rxdbebug shows for the server, too:

    Free packets: 242, packet reclaims: 202, calls: 120829063, used FDs: 64
    not waiting for packets.
    226 calls waiting for a thread
    2 threads are idle


> However, lots of bugs which would affect this fixed since 1.4.1,
> which is ancient.

Indeed. We've been upgrading within RHEL releases so far, but we're
planning to jump from RHEL4 (sigh) to RHEL5 (finally) in the near
future. That should, at a glance, get us to at least 1.4.7.

Thanks!

-- 

[Will Maier]-----------------[willmaier@ml1.net|http://www.lfod.us/]