[OpenAFS] [1.2.7] Strange file server meltdown

Rainer Toebbicke rtb@pclella.cern.ch
Fri, 13 Dec 2002 18:04:38 +0100


Derrick J Brashear wrote:

> 
> Incidentally, about the only delta which would matter is
> STABLE12-viced-provide-way-to-not-retraverse-hostlist-20020821
> 

Thanks, I had spotted that one already as it touches host.c.

However, looking at the 'host' code a bit more carefully I noticed something 
else which looks dangerous:

the 'h_Hold_r(host)' macro uses an index into a bitmap to 'hold' the host on a 
per thread basis. That index is obtained through pthread_getspecific(). Most 
of the threads in the fileserver (all threads created by rxi_XXXX) have a non 
zero index here.

A few haven't (the index would thus be zero), and two of them actually go 
through the hosts table:

1. The HostCheckLWP 5-minute-timebomb.
2. the FSYNC_askfs server in vol/fssync.c when breaking callbacks

Top suspects for next week but perhaps somebody already knows better?

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke        http://cern.ch/~rtb         rtb@mail.cern.ch  O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland   > |
Phone: +41 22 767 8985       Fax: +41 22 767 7155                     ( )\( )