[OpenAFS] Re: Linux server/client hangs and crashes
Wed, 11 Aug 2004 10:39:36 -0600
>> I have tried using the lwp version of fileserver, wrapped the
>> pthreads version with LD_ASSUME_KERNEL=2.4.1, and done tcpdump,
>> cmdebug, fstrace, etc. etc. ad nauseam.
> You actually copied the binary out of src/lwp/fileserver, and didn't
> just use what was as you said "the only fileserver"?
yes - I explicitly compiled the entire source and then grabbed both
copies of the fileserver binary - the src/viced/fileserver and
src/tviced/fileserver, placed both in the /usr/afs/bin/ directory as
filserver.lwp and fileserver.pthreads - added a shell script wrapper
that sets LD_ASSUME and calls the pthreads variant.
Repeated the same test case on both flavors, same symptoms. Mass
write/copy/delete operations will hang - some single file
write/read/delete operations can proceed but will sporadically hangup.
Client shows an active cache entry with cmdebug, server reports itself
to be up and processes appear normal. Other machines can access the
cell in question and read/write to the same volume.
>> When I tried to use fstrace on the fileserver - bam kernel panics
>> right and left - I'm trying to setup a serial console to capture
>> these now.
> fstrace traces the client. the problem is in the fileserver. you're
> barking up the wrong tree.
I know - I was attempting to run the test from the fileserver machine
back to itself as a client. One of the problems with the Mac OS X
install of openafs is that it is missing certain tools - fstrace is one
of them. Since the symptoms appeared on linux as well I chose to try
and debug from a linux box to a linux box, thats when my to-date rock
solid fileserver (running for over 1 year without a crash) went belly
up as I tried to run the AFS tests. Three exact duplicate OOPS panics,
and fun watching my raid-5 rebuild all day.
At this point I have no idea how to trace down the source of the
problem - could it help to downgrade GCC and GLIBC to a known good
Is there any know issue with 2.4.x kernels - or grsecurity patch?