[OpenAFS] [1.2.7] Strange file server meltdown
Rainer Toebbicke
rtb@pclella.cern.ch
Fri, 13 Dec 2002 18:04:38 +0100
Derrick J Brashear wrote:
>
> Incidentally, about the only delta which would matter is
> STABLE12-viced-provide-way-to-not-retraverse-hostlist-20020821
>
Thanks, I had spotted that one already as it touches host.c.
However, looking at the 'host' code a bit more carefully I noticed something
else which looks dangerous:
the 'h_Hold_r(host)' macro uses an index into a bitmap to 'hold' the host on a
per thread basis. That index is obtained through pthread_getspecific(). Most
of the threads in the fileserver (all threads created by rxi_XXXX) have a non
zero index here.
A few haven't (the index would thus be zero), and two of them actually go
through the hosts table:
1. The HostCheckLWP 5-minute-timebomb.
2. the FSYNC_askfs server in vol/fssync.c when breaking callbacks
Top suspects for next week but perhaps somebody already knows better?
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke http://cern.ch/~rtb rtb@mail.cern.ch O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland > |
Phone: +41 22 767 8985 Fax: +41 22 767 7155 ( )\( )