[OpenAFS] Re: Fileserver machine freezes for 4 seconds every 12 seconds

Andrew Deason adeason@sinenomine.net
Wed, 4 Apr 2012 10:17:56 -0500


On Tue, 3 Apr 2012 20:19:18 -0700
Ken Elkabany <Ken@Elkabany.com> wrote:

> We're noticing an odd behavior while SSH-ed into the file servers.
> Every 12 seconds, the fileserver and volserver hit 100% CPU usage, and
> our SSH terminals freeze for about 4 seconds. While we often have 200+
> clients actively using our two fileservers, this occurs even when we
> have only about 40.

Actively "using"... reading, writing, what? Do you know?

> If the fileservers are restarted, the issue goes away for about some
> time, but then returns after 30 minutes to an hour. What's the best
> way to diagnose this issue? I've been using xstat and afsmonitor, but
> they aren't very revealing in this situation.

This is Linux?

This could just be related to the other client issue in that 'ProbeUuid
for host failed' thread... or you can try this on the server:

<http://git.openafs.org/?p=openafs.git;a=patch;h=0e5c743b609c8d719c74eeefc7d7ecb0cf86a82d>

Otherwise, I don't know if any of the easy network-accessible stats or
whatever are going to tell you much. If you can get a core of the
fileserver process ('gcore <fileserver_pid>') during that 4 second
window, we/you/somebody can look at the thread backtraces and say what
it's doing at the time.

-- 
Andrew Deason
adeason@sinenomine.net