[OpenAFS] Random hangs with OpenAFS 1.2.11 and RHEL 3

Tim Carlson Tim Carlson <tim.carlson@pnl.gov>
Mon, 26 Apr 2004 11:17:48 -0700 (PDT)


We've been experiencing some random hangs with OpenAFS 1.2.11 client and
RHEL 3.  I say they are random because I have not yet been able to
reproduce the hang. A bit more info on our setup:

OpenAFS 1.2.11 (rebuilt source RPMS to add in our own ThisCell, CellServDB
and add a bit in the init script that does an "fs sysname i386_aw3")

RHEL WS 3 (kernels 2.4.21-9.0.3.ELsmp 2.4.21-9.0.1.ELsmp 2.4.21-9.ELsmp)

Our AFS servers are the latest Transarc running on Solaris/Sparc.

What seems to happen is the machine will lock up on occasion when you do a
more/less on a file. This seems to happen once every few days. It is very
strange because we have our home directories in AFS and have pine or
mozilla running all the time pointing to our AFS home directories. The
problem doesn't seem to be related to increased AFS I/O because I can run
the bonnie++ benchmark in an AFS volume and not cause the system to lock.

When I say the system locks, I mean that there is no longer any response
from the console (running Gnome). You can ping the box, but it does not
answer ssh requests. The only solution is a power cycle.

There are no message in the logs related to the lockups.

Has anybody experienced similar problems? Any workarounds/solutions?

Thanks

Tim

Tim Carlson
Voice: (509) 376 3423
Email: Tim.Carlson@pnl.gov
EMSL UNIX System Support