[OpenAFS] Re: Afs User volume servers in VM's

Stephan Wiesand stephan.wiesand@desy.de
Wed, 26 Oct 2011 18:41:15 +0200

On Oct 26, 2011, at 17:57 , Andrew Deason wrote:

> On Wed, 26 Oct 2011 17:26:55 +0200
> Stephan Wiesand <stephan.wiesand@desy.de> wrote:
>> Running multiple fileservers on different ports on the same system
>> would be even more efficient. Is this possible or could be =
>> (in theory)?
> In theory, of course. You just need a way in the fileserver to specify =
> port, vldb modifications to store a port for the fileserver, and the
> protocol modifications to communicate a port to the vldb and to =
> You know, "just" all that :)

I had a feeling that it's not easy, but had to ask ;-)

> The prerequisites for the vldb
> modifications have already been discussed a bit on the standardization
> list.
> That would of course limit the amount of clients that can actually
> access that fileserver, since no clients today can do any of that.

As long as old clients could still access volumes through the process =
listening on 7003, that would be no problem at all in practice even if =
the old contention issues were still present. In our environment, it =
would even be ok if certain volumes were accessible by "new" clients =
only, as long as the "old" clients don't crash or hang their systems but =
return a meaningful error message (like "please update your AFS client =
to access this data").
>> What would be a great feature to have is a way to keep the server =
>> using more than, say, half of the available threads for a single
>> volume. Would this be feasible to implement at all?
> Sure. Well, sort of. The server can obviously keep track of which =
> are associated with what volume (and already do post-1.6, for
> -offline-timeout functionality), so if the number of calls is greater
> than "X", we can just not service the request.
> However, the way to do that is to return a VBUSY error to the client
> ("busy; try again later"),

Well, isn't this the right answer in that situation?

> which cause the clients to sleep and retry
> after some number of seconds (and make them log those "busy waiting =
> volume" messages). And we only do that after we've obtained some kind =
> reference to the relevant volume, which is after a considerable amount
> of processing has been done.

If some threads are left for processing this, because they're not all =
tied up servicing I/O for a single volume (or waiting for a callback on =
it to be broken), that would usually not be a problem.

> Maybe we could check that a bit earlier,
> but the point is we'd have to receive the call and return an error if
> we're over quota for the volume, which takes some extra processing; we
> can't, like, direct calls dealing with volumes to a certain subset of
> threads or something. But I guess that's probably not a problem; it
> could make things worse in some situations but better in others.

Booker and me would probably be ok with errors being returned upon =
access to a single volume that's being overwhelmed with I/O requests - =
if it just wouldn't make the fileserver as a whole grind to a halt and =
not service any request any more.

Stephan Wiesand
Platanenenallee 6
15738 Zeuthen, Germany