[OpenAFS-devel] prdb format extension for extended authentication
names
Benjamin Kaduk
kaduk@MIT.EDU
Thu, 30 May 2013 11:39:11 -0400 (EDT)
jhutz found some old notes as well, the main thing I take from them being
backwards compatibility.
Up to now I had been only concerned about staying within the existing
block structure, which is fairly entertwined with backwards compatibility.
I'll try to list the design considerations I know of for a prdb extension,
and say a bit about some of them: backwards and forwards compatibility,
the ability for db-maintenance tools to recover from (e.g.) hash table
corruption, and preserving existing invariants come to mind. Also, jhutz
has just about convinced me that it is irresponsible to use the last spare
field for a specific extension (as opposed to a general extensible
structure), even if we think that a full format revision is coming "soon",
so that sould be added to the design considerations list.
Are three other considerations that we should take into account?
Within these considerations (and any others that come up in this
discussion), I am working on a concrete proposal. Would people prefer to
see this in the form of ptserver.h struct declarations and comments, or an
addition to the prdb format writeup I have at
https://github.com/kaduk/openafs/blob/prdb/doc/txt/prdb.txt ?
Per-consideration notes:
%%%
Backwards compatibility is pretty easy, all we have to do is not touch
existing structures and stick to strictly extensions of the existing
format. Then the new code will handle existing databases just fine.
%%%
It is strongly desirable to have forwards compatibility, namely, an old
ptserver should not choke on or scribble over the new style entries. It
is hard to guarantee that an old ptserver will not see new style entries
without updating all dbservers at once, and there are operational issues
to wish to phase in new code. By lucky chance, forward compatibility is
possible -- the old code recognizes PRFOREIGN and PRINST in the flags
field as being valid entries, but does not generate them. This lets us
steal one of these bits, say PRINST, to indicate that an entry is an
"extended entry", and within such extended entries use the unallocate
flags bits to distinguish between types of entries. There are eight
unused "type flags" bits, though perhaps we need not claim all of them,
particularly if we use them as an integral enumeration of types and not as
flag bits. I'm not entirely sure what other types of extended entries we
might want and whether the enum treatment is appropriate. The old notes
I'm looking at sketch out a generic "optentry" to hold "option blocks",
with a field for what kind of option and an afsUUID to which they belong
(to prevent option blocks from being incorrectly reused when a pts id is
recycled), but I'm not tied to that. The comments indicate it could be
used for supergroup information if someone wanted to clean up/reimplement
that code.
%%%
The following fields are invariant in all existing entry structures;
retaining them should allow old ptservers to recognize (and print, to some
extent) the new entries we add:
flags (really only the low 16 bits, which I call "type_flags" in my format writeup)
id
cellid
next
Note that cellid is only rarely used.
Flags including PRINST will tell old code that this block is allocated,
and next allows a utility reading the database to follow the chain of
blocks in the same logical structure, even if it does not know exactly how
to interpret those blocks.
%%%
Another desired property for a format extension is recoverability from
minor corruption. extention entries will include the id of the entry they
correspond to, and link fields help tie related entries together. That
should be enough to (say) reconstruct a hash table if it gets lost or
corrupted. This design goal is necessarily less well specified than the
others, as it will always be possible to corrupt a database to an
unrecoverable state. There is a tradeoff between resiliency and
efficiency -- lots of link fields ease reconstruction but consume space
and resources. I don't think that our application is particularly
sensitive to this tradeoff; any reasonable level of linking is probably
fine.
%%%
On Sat, 18 May 2013, Simon Wilkinson wrote:
> Across the tree, I've been moving OpenAFS towards using jhash for
> hashing. However, there are some challenges about using this for ubik
> databases. In particular, the current code doesn't attempt to cater for
> endianness. I suspect you will get different answers for jhash2 on big
> and little endian processors. Fixing this shouldn't be that complex -
> the original lookup3.c code does the right thing, it's just a case of
> adapting that for OpenAFS.
Yeah, we'd need to either make a wrapper that does byteswaps or pull in a
new snapshot. A new snapshot with 'nbo' or similar in the name sounds
promising.
The jenkins family of hashes also has the nice property that the table
size need not be a prime -- we can use a size of 8192 and a mask to get
the table index instead of a modular division.
-Ben