[OpenAFS] Re: Vos functions and clones and shadows

Adam Megacz megacz@cs.berkeley.edu
Tue, 26 Jun 2007 15:53:47 -0700

Marcus Watts <mdw@spam.ifs.umich.edu> writes:
>> Is the "volume numbers share all but the last three bits" criterion
>> visible to the cache manager, or is this something that could be
>> altered just on the servers and admin clients (vos, bos, etc)?

> Cow.  They aren't really 3 separate discrete volumes.  They share
> data, and that means on the fileserver, the logic knows this at a
> deep level.

Hrm, let me see if I have this right.

  - When the fileserver wants to know if two volumes share blocks, it
    checks to see if they're in the same volume group.

  - When the fileserver wants to enumerate the set of volumes that
    share blocks with a given volume, it checks all other numerical
    volume ids which could possibly be in its volume group.

  - The definition of a volume group is the volume id of a rw volume
    and the two volume id's larger than it.

Is this correct?

> The cache manager doesn't know any of this.  Volume numbers
> are completely arbitrary to it.

That is really, really good news.

> Incidently volume numbers don't share "all but the last 3 bits".  If
> that were true, volumes would be separated by 8 not 3.  In very old
> AFS, volume IDs were assigned on the fly and might vary much more
> randomly, but this hasn't been true for a long time and volumes that
> old seem to not behave quite right in openafs.  The extra cow copies
> that Dan and Steve are creating do have "random" volume ids that
> could vary widely (actually they are assigned using a counter in
> vldb...).

Okay, I guess I was confused then.  Does this mean that volume group
consist only of RW, RO, and a "special clone" called BK which is part
of its RW's volume group (whereas all other clones are not)?

And does it also mean that the sole purpose of volume groups is to
know what volumes to get rid of when an RW is removed?

> The "all but the last 3 bits" bit has to do with some of
> the internal book-keeping logic inside the namei fileserver,
> where it has clever bitpacking for stuff.  This is mostly
> used for "per-inode" data, such as the stuff in the link table.

> The inode based fileserver does not have the "last 3 bits" problem,
> and could have many more cow copies.  Too bad it's such a hack.

Fascinating.  That also means that any code which would need to be
altered to support >4 clones would be confined to the namei-specific
part of the fileserver code.

  - a

PGP/GPG: 5C9F F366 C9CF 2145 E770  B1B8 EFB1 462D A146 C380