[OpenAFS] Prolonged period of blocked connections

Derrick Brashear shadow@gmail.com
Wed, 4 Feb 2009 17:13:47 -0500


On Wed, Feb 4, 2009 at 4:51 PM, Will Maier <willmaier@ml1.net> wrote:
> Hi Derrick-
>
> On Wed, Feb 04, 2009 at 04:42:05PM -0500, Derrick Brashear wrote:
>> On Wed, Feb 4, 2009 at 4:38 PM, Will Maier <willmaier@ml1.net> wrote:
>> > In the past, we've observed prolonged periods where one or more of
>> > our servers would report more than 200 calls waiting for a thread.
>> > This occurred again this morning and lasted for about four hours.
>>
>> bos status (fileserverhost) fs -long
>>
>> and post that information?
>
> Here's what I get:
>
>    Instance fs, (type is fs) currently running normally.
>        Auxiliary status is: file server running.
>        Process last started at Wed Feb  4 12:01:36 2009 (6 proc starts)
>        Last exit at Wed Feb  4 12:01:36 2009
>        Command 1 is '/usr/afs/bin/fileserver'

which means you have some very small number of threads

-p 128, for starters, would make life much better, but...

>        Command 2 is '/usr/afs/bin/volserver'
>        Command 3 is '/usr/afs/bin/salvager'
>
> Here's what top says currently:
>
>      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>     3084 root      11  -5  204m  17m 1216 S  174  0.2 299:30.71 fileserver
>
> And, for good measure, here's what rxdbebug shows for the server, too:
>
>    Free packets: 242, packet reclaims: 202, calls: 120829063, used FDs: 64
>    not waiting for packets.
>    226 calls waiting for a thread
>    2 threads are idle
>
>
>> However, lots of bugs which would affect this fixed since 1.4.1,
>> which is ancient.
>
> Indeed. We've been upgrading within RHEL releases so far, but we're
> planning to jump from RHEL4 (sigh) to RHEL5 (finally) in the near
> future. That should, at a glance, get us to at least 1.4.7.

1.4.8 exists for RHEL5, and RHEL4....