[OpenAFS] [1.2.7] Strange file server meltdown
Rainer Toebbicke
rtb@pclella.cern.ch
Fri, 13 Dec 2002 10:05:26 +0100
Russ Allbery wrote:
> Hello folks,
>
> We're running OpenAFS 1.2.7 on Solaris 8, and are seeing an unusual
> problem. Two of our file servers are periodically going into an
> apparently load-related meltdown around 3:30am to 4:00am at fairly
> unpredictable intervals. We're having about one instance of this a week.
>
...
We've twice seen similar problems in the last two weeks: Solaris 2.8, OpenAFS
1.2.7, fileserver has '3252 calls waiting for a thread
2 threads are idle' - all clients on it are hanging, system 100% CPU, 'bos
restart' sends a msg to FileLog but then nothing.
I took a 'gcore', a snoop snapshot, and rxdebug output while the server was in
that state. Also did a truss: the server was *only* doing send/receive, no
disk I/O, no nothing.
Luckily we're always running everything compiled with '-g': all threads except
the usual 'maintenance' ones in the gcore were waiting on host_glock_mutex,
except the one which held it in h_TossStuff_r. I went down the host chain it
was 'tossing' but there was no obvious sign of a tight loop as the chain
finished after a couple of dozens. A pity that I did not take another gcore a
few seconds later.
Actually, we're now busy reverting to OpenAFS 1.2.6! Circumstantial evidence
only - a number of problems appeared shortly after upgrading to 1.2.7:
1. twice this hanger
2. an assertion failure in host.c GetClient(). There is obviously a window for
a race condition in h_GetHost_r. The question whether this is related to 1.)
above.
3. TWO crashes each night around 4:30 in rxi_AttachServerProc() on
queue_Remove(call) with call==NULL.
I'll post details about 2 & 3 above as soon as I am able to collect enough
evidence.
In the meantime: how can I find out what deltas went into 1.2.7 after 1.2.6?
Globally, I mean, not on a per source file level?
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke http://cern.ch/~rtb rtb@mail.cern.ch O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland > |
Phone: +41 22 767 8985 Fax: +41 22 767 7155 ( )\( )