[OpenAFS-devel] meaning of VNOVOL, VOFFLINE, etc.

Tom Keiser tkeiser@sinenomine.net
Tue, 8 May 2012 13:55:34 -0400


On Fri, May 4, 2012 at 6:35 PM, Jeffrey Altman
<jaltman@your-file-system.com> wrote:
> Tom:
>
> The 1991 Zayas specifications are lacking in many regards. =A0For
> starters, the Vxxx error codes are only defined for the Vol/VL RPCs and
> not for the FS/CM RPCs. =A0The use of the Vxxxx error codes in the FS/CM
> RPCs is left undefined and yet those errors are reported to cache
> managers by file servers.
>

Hi Jeff,

I was wondering if you were going to raise this distinction.  It is
indeed troubling how little the FS/CM document has to say about
this--and many other--issues.

> I think it was 2004 or perhaps early 2005 when a large user was
> concerned about VLDB scalability due to the introduction of tens of
> thousands of Windows clients into the environment. =A0Each time a VNOVOL,
> VMOVED, VOFFLINE, VSALVAGE or VNOSERVICE error was received the Windows
> client would query the VLDB and retry the request after 2 seconds. =A0If =
a
> volume couldn't be served from a file server this process would be
> repeated. =A0This is exacerbated by the behavior of the Explorer Shell
> which reads the contents of directories it displays searching for
> various metadata. =A0As a result the VLDB servers were struggling under
> the load. =A0It wasn't going to be possible to make the VLDB servers
> process more requests so it was important to reduce the number of
> requests that were sent.
>
> The discussions that took place came to the conclusion that the
> description of VNOVOL was ambiguous and its meaning based upon usage
> should be that the volume is not present. =A0With that interpretation a
> client could restrict the number of VLDB lookups for a volume. =A0I do no=
t
> remember if these discussions took place at a hackathon, a workshop, or
> on Zephyr. =A0 Such use of the error codes didn't make a difference to
> deployed clients since they acted on all error codes in an identical
> fashion nor did it result in a protocol change given existing use in the
> file server.
>

Thanks for the explanation; this is precisely what I was hoping to
find out.  Given this, I will push the VNOVOL changes.

> Perhaps others can find a reference in Zephyr logs. =A0I no longer have
> access to them.
>

I, personally, don't need anyone to go to that level of effort.

This discussion raises a question: how do we want to clarify the
documentation so that the various meanings of each error code (and in
which context each meaning applies) are clear?  Can I just push
changes to fs-cm-spec.h, and vldb-vol-spec.h into gerrit?  Or, would
the community prefer that afs3-stds have a chance to review any
language changes before they are pushed?  Ideally, I'd want to codify
this in an I-D describing how clients should behave in the face of rx
aborts, but I don't foresee having time to do that anytime soon.

Regards,

-Tom