[OpenAFS-devel] Fileserver deadlock after callback handling error

Rainer Többicke Rainer.Toebbicke@cern.ch
22 Mar 2004 11:37:57 +0100


Anybody happens to have an idea on how to handle the following:


viced/callback.c calls 'ShutDown()' resp. 'ShutDownAndCore()' whenever
something went wrong in its internal logic.

The problem is, you cannot shutdown the fileserver at any moment, in
particular not if you're accounted as V_inUse for a volume - which is
what is quite likely to happen when you're in AddCallBack1() and
thereabouts: the shutdown code will attempt to "VOffline" the volume and
deadlock on waiting for the thread that initiated the shutdown to
release it.

Possibly fixes:

1. time out eventually in VOffline - but what about the volume?
Shouldn't it at least be flagged to be salvaged?

2. don't do the shutdown on your own, rather signal the main thread. But
then what? Return from the 'AddCallback()', but with what?

3. Drop this attempt to recover gracefully and rather crash hard? It
doesn't happen very often and in any case it means something was
definitely screwed up. I actually haven't understood yet what in the
callback hashes went wrong, more worried to prevent the fileserver from
hanging at all.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke 
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155