[OpenAFS] Prolonged period of blocked connections
Derrick Brashear
shadow@gmail.com
Wed, 4 Feb 2009 17:13:47 -0500
On Wed, Feb 4, 2009 at 4:51 PM, Will Maier <willmaier@ml1.net> wrote:
> Hi Derrick-
>
> On Wed, Feb 04, 2009 at 04:42:05PM -0500, Derrick Brashear wrote:
>> On Wed, Feb 4, 2009 at 4:38 PM, Will Maier <willmaier@ml1.net> wrote:
>> > In the past, we've observed prolonged periods where one or more of
>> > our servers would report more than 200 calls waiting for a thread.
>> > This occurred again this morning and lasted for about four hours.
>>
>> bos status (fileserverhost) fs -long
>>
>> and post that information?
>
> Here's what I get:
>
> Instance fs, (type is fs) currently running normally.
> Auxiliary status is: file server running.
> Process last started at Wed Feb 4 12:01:36 2009 (6 proc starts)
> Last exit at Wed Feb 4 12:01:36 2009
> Command 1 is '/usr/afs/bin/fileserver'
which means you have some very small number of threads
-p 128, for starters, would make life much better, but...
> Command 2 is '/usr/afs/bin/volserver'
> Command 3 is '/usr/afs/bin/salvager'
>
> Here's what top says currently:
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3084 root 11 -5 204m 17m 1216 S 174 0.2 299:30.71 fileserver
>
> And, for good measure, here's what rxdbebug shows for the server, too:
>
> Free packets: 242, packet reclaims: 202, calls: 120829063, used FDs: 64
> not waiting for packets.
> 226 calls waiting for a thread
> 2 threads are idle
>
>
>> However, lots of bugs which would affect this fixed since 1.4.1,
>> which is ancient.
>
> Indeed. We've been upgrading within RHEL releases so far, but we're
> planning to jump from RHEL4 (sigh) to RHEL5 (finally) in the near
> future. That should, at a glance, get us to at least 1.4.7.
1.4.8 exists for RHEL5, and RHEL4....