[OpenAFS-devel] 1.4.0-rc4 weirdness
Robert Banz
banz@umbc.edu
Tue, 15 Nov 2005 17:58:10 -0500
Jim Rees wrote:
> The interesting thread will probably be the CheckHost thread.
>
> Maybe, and I've got a patch for this I'm testing now. But I think this is a
> different problem. The bug I'm chasing makes all the worker threads hang
> waiting for more space in the callback table. The server eventually
> recovers.
>
> The problem Christopher describes shows many calls waiting for a thread, and
> yet the pstack shows many threads waiting for a call. And the server never
> recovers. Looks like the worker threads aren't waking up, or aren't finding
> the calls when they do.
>
> The CheckHost loop does hog host locks, but only one at a time.
Yeah, the pstack output I have shows the CheckHost thread being idle at
the time, so it might not be that.
Well, if one of my servers yarf tomorrow (which they probably will),
I'll have a core to examine from a fileserver & libraries built with
"-g", so we might be able to do a bit more research.
-rob