[OpenAFS] AFS server processes hanging on SG servers

chas williams chas@locutus.cmf.nrl.navy.mil
Tue, 21 Jan 2003 10:47:17 -0500


In message <20030121131536.GC28240@afs.mcc.ac.uk>,Dr A V Le Blanc writes:
>For your information, we've now had the same thing happen with
>bosserver: that is, the process hung and is unreachable, high
>load average, and so on.

i have seen this problem before on an sgi running transarc afs 34a.
it fairly rare but there does not seem to be a good reason for the
hangs -- they must always be resolved with a reboot.

>(that buserver, ptserver, kaserver, and (yes) volserver have hung.)

the biggest difference between fileserver and the rest of the afs
servers might be lwp.  the (threaded) fileserver is slightly more
thread aware than the other servers.  i dont have any hard facts but
lwp is common to all the above but is slightly different for fileserver.
the lwp code is probably the most difficult part of the user-space 
afs to implement correctly (the setjmp/longjmp mechanism, while clever,
may be not be completely correct for sgi_65)