[OpenAFS-devel] 1.4.0-rc4 weirdness

Robert Banz banz@umbc.edu
Tue, 15 Nov 2005 17:58:10 -0500


Jim Rees wrote:
>   The interesting thread will probably be the CheckHost thread.
> 
> Maybe, and I've got a patch for this I'm testing now.  But I think this is a
> different problem.  The bug I'm chasing makes all the worker threads hang
> waiting for more space in the callback table.  The server eventually
> recovers.
> 
> The problem Christopher describes shows many calls waiting for a thread, and
> yet the pstack shows many threads waiting for a call.  And the server never
> recovers.  Looks like the worker threads aren't waking up, or aren't finding
> the calls when they do.
> 
> The CheckHost loop does hog host locks, but only one at a time.

Yeah, the pstack output I have shows the CheckHost thread being idle at 
the time, so it might not be that.

Well, if one of my servers yarf tomorrow (which they probably will), 
I'll have a core to examine from a fileserver & libraries built with 
"-g", so we might be able to do a bit more research.

-rob