[OpenAFS] Re: Afs User volume servers in VM's
Wed, 26 Oct 2011 10:32:15 -0700 (PDT)
On Wed, 26 Oct 2011, Andrew Deason wrote:
> On Wed, 26 Oct 2011 18:41:15 +0200
> Stephan Wiesand <email@example.com> wrote:
>> Booker and me would probably be ok with errors being returned upon
>> access to a single volume that's being overwhelmed with I/O requests -
>> if it just wouldn't make the fileserver as a whole grind to a halt and
>> not service any request any more.
> Well, see, it depends on _what_ is causing it to do that, as Jeffrey
> said. If the threads are hanging on a lock somewhere in the host package
> or Rx or something, this won't help a whole lot since we still have to
> go through those layers and we'll still hang on those locks (same thing
> for chewing up CPU, or moving memory around, etc). In fact, we'll do so
> even more, since we (eventually) have to go through all that at least
> twice for the VBUSY case.
The symptom we see is thread exhaustion due to write callbacks
from many clients for a single volume. The problem is
insidious as it's not a gradual failure, because everything works just fine
until you hit a tipping point in the number of batch jobs.
It's often a file that the user isn't even aware they are
opening, but is a small file used by some library they are
using. Sometimes tracking down the file can take significant
- Booker C. Bense
- I'm not the stuckee when this happens, just an interested
bystander so I may have the details slightly incorrect.