[OpenAFS] Fileserver loses contact with itself

Mattias Amnefelt mattiasa@kth.se
Wed, 19 Nov 2003 15:31:18 +0100


> 3) assuming you have a cc -g (debug) build running, attach with gdb
> while the server is hanging, and execute the following:
> thread apply all where
> 
> save the output to a file, put it somewhere public, and send mail
> about it.

There's a thread apply all where for a hung fileserver in
http://www.e.kth.se/~mattiasa/openafs/thread_apply_all_where or
/afs/e.kth.se/home/staff/mattiasa/public_html/openafs/thread_apply_all_where
respectively.

This fileserver is a 1.2.10 on redhat 9 using redhat kernel
2.4.20-20.9smp. We've seen this two or three times, and I know of two
occations where this has happened while atleast one release operation
has been going on. The volserver answers normaly, but the fileserver
becomes unresponsive. It doesn't answer to rxdebug and a tcpdump shows
that it doesn't answer any rx packets.

On one occation I sent SIGXCPU to the fileserver to get some debuginfo,
and when I looked after I hade sent the signal, the fileserver had
started to respond again. On another occation I attached with the
debugger and got the thread listing above, and when the fileserver
continued it had started responding. I don't know if my actions are
related to the behaviour or not though.

/mattiasa