[OpenAFS-devel] prdb format extension for extended authentication names

Thu, 30 May 2013 11:39:11 -0400 (EDT)

jhutz found some old notes as well, the main thing I take from them being 
backwards compatibility.

Up to now I had been only concerned about staying within the existing 
block structure, which is fairly entertwined with backwards compatibility.

I'll try to list the design considerations I know of for a prdb extension, 
and say a bit about some of them: backwards and forwards compatibility, 
the ability for db-maintenance tools to recover from (e.g.) hash table 
corruption, and preserving existing invariants come to mind.  Also, jhutz 
has just about convinced me that it is irresponsible to use the last spare 
field for a specific extension (as opposed to a general extensible 
structure), even if we think that a full format revision is coming "soon", 
so that sould be added to the design considerations list.

Are three other considerations that we should take into account?

Within these considerations (and any others that come up in this 
discussion), I am working on a concrete proposal.  Would people prefer to 
see this in the form of ptserver.h struct declarations and comments, or an 
addition to the prdb format writeup I have at 
https://github.com/kaduk/openafs/blob/prdb/doc/txt/prdb.txt ?

Per-consideration notes:
%%%

Backwards compatibility is pretty easy, all we have to do is not touch 
existing structures and stick to strictly extensions of the existing 
format.  Then the new code will handle existing databases just fine.

%%%

It is strongly desirable to have forwards compatibility, namely, an old 
ptserver should not choke on or scribble over the new style entries.  It 
is hard to guarantee that an old ptserver will not see new style entries 
without updating all dbservers at once, and there are operational issues 
to wish to phase in new code. By lucky chance, forward compatibility is 
possible -- the old code recognizes PRFOREIGN and PRINST in the flags 
field as being valid entries, but does not generate them.  This lets us 
steal one of these bits, say PRINST, to indicate that an entry is an 
"extended entry", and within such extended entries use the unallocate 
flags bits to distinguish between types of entries.  There are eight 
unused "type flags" bits, though perhaps we need not claim all of them, 
particularly if we use them as an integral enumeration of types and not as 
flag bits.  I'm not entirely sure what other types of extended entries we 
might want and whether the enum treatment is appropriate.  The old notes 
I'm looking at sketch out a generic "optentry" to hold "option blocks", 
with a field for what kind of option and an afsUUID to which they belong 
(to prevent option blocks from being incorrectly reused when a pts id is 
recycled), but I'm not tied to that.  The comments indicate it could be 
used for supergroup information if someone wanted to clean up/reimplement 
that code.

%%%

The following fields are invariant in all existing entry structures; 
retaining them should allow old ptservers to recognize (and print, to some 
extent) the new entries we add:

flags (really only the low 16 bits, which I call "type_flags" in my format writeup)
id
cellid
next

Note that cellid is only rarely used.
Flags including PRINST will tell old code that this block is allocated, 
and next allows a utility reading the database to follow the chain of 
blocks in the same logical structure, even if it does not know exactly how 
to interpret those blocks.

%%%

Another desired property for a format extension is recoverability from 
minor corruption.  extention entries will include the id of the entry they 
correspond to, and link fields help tie related entries together.  That 
should be enough to (say) reconstruct a hash table if it gets lost or 
corrupted.  This design goal is necessarily less well specified than the 
others, as it will always be possible to corrupt a database to an 
unrecoverable state.  There is a tradeoff between resiliency and 
efficiency -- lots of link fields ease reconstruction but consume space 
and resources.  I don't think that our application is particularly 
sensitive to this tradeoff; any reasonable level of linking is probably 
fine.

%%%

On Sat, 18 May 2013, Simon Wilkinson wrote:

> Across the tree, I've been moving OpenAFS towards using jhash for 
> hashing. However, there are some challenges about using this for ubik 
> databases. In particular, the current code doesn't attempt to cater for 
> endianness. I suspect you will get different answers for jhash2 on big 
> and little endian processors. Fixing this shouldn't be that complex - 
> the original lookup3.c code does the right thing, it's just a case of 
> adapting that for OpenAFS.

Yeah, we'd need to either make a wrapper that does byteswaps or pull in a 
new snapshot.  A new snapshot with 'nbo' or similar in the name sounds 
promising.

The jenkins family of hashes also has the nice property that the table 
size need not be a prime -- we can use a size of 8192 and a mask to get 
the table index instead of a modular division.

-Ben