[OpenAFS] Re: httpd blocked

Andrew Deason adeason@sinenomine.net
Thu, 23 Jun 2011 10:44:45 -0500


On Mon, 20 Jun 2011 16:19:01 -0700
Jonathan Nilsson <jnilsson@uci.edu> wrote:

> i suspect that the system kept spawning httpd processes as old ones
> got blocked and eventually it ran out of memory and became
> unresponsive. after a reboot it works fine. so the question is, what
> caused the afs cache manager to respond so slow?
> 
> can anyone confirm if they have seen kernel messages like this? how
> can i confirm if the problem is with the client or the server? i see
> no error messages in BosLog, FileLog, or VolserLog on our servers...

If the processes were hanging forever or for a very long time, it's not
likely to be the fault of any server, since the client doesn't wait
around forever for a response. I assume there were no messages about
losing contact with file or vl servers in the client logs around that
time?

It's easier to see what's going on if we know what's going on with the
rest of the system when that happens. If you ever catch it doing that,
running 'echo t > /proc/sysrq-trigger' will generate a lot of info (some
of it useful) in syslog. Or if you can get the machine to dump core,
that's the most useful thing, but you don't want to just go giving that
out to anybody.

-- 
Andrew Deason
adeason@sinenomine.net