[OpenAFS] Re: Afs User volume servers in VM's

Andrew Deason adeason@sinenomine.net
Wed, 26 Oct 2011 12:59:05 -0500


On Wed, 26 Oct 2011 10:32:15 -0700 (PDT)
Booker Bense <bbense@slac.stanford.edu> wrote:

> The symptom we see is thread exhaustion due to write callbacks from
> many clients for a single volume[1]. The problem is insidious as it's
> not a gradual failure, because everything works just fine until you
> hit a tipping point in the number of batch jobs.

That is helpful information. We could probably do some quota enforcement
on vnode locks like we do for host locks, and/or I've sometimes wondered
if only waiting a certain amount of time would make sense ("if you
haven't got the lock in X seconds, VBUSY"). Limiting the number of
active requests for a single volume could also help with that case, but
limiting per-vnode could allow you to have more stringent quotas without
adversely affecting legit uses too much.

There are also some improvements to make callback breaks go faster in (I
think) post-1.6 that might make it into 1.6 sometime, but that doesn't
really get at the underlying DoS issue.

-- 
Andrew Deason
adeason@sinenomine.net