[OpenAFS-devel] prdb format extension for extended authentication names

Benjamin Kaduk kaduk@MIT.EDU
Fri, 17 May 2013 19:32:30 -0400 (EDT)


On Fri, 17 May 2013, Benjamin Kaduk wrote:

> On Fri, 17 May 2013, Simon Wilkinson wrote:
>
>> hash to be used to for the authentication name hash table isn't specified.
>
> Reusing the old NameHash is probably not a good choice.  I'll think about 
> think about this, too.

Consulting Knuth, it sounds like the state of the art is a pretty old 
solution -- for a multiword/multicharacter key, use
h(K) = (h_1(x_1) + h_2(x_2) + ... + h_n(x_n)) mod M
where each h_i() applies to word i and is a hash function independent 
of the other h_j() (for us, M should probably remain 8191).

We could use either 8- or 32-bit words (since our data structure is 
inherently 32-bit word-aligned), but he notes that using characters allows 
the hash to be just a lookup table.

That formulation has very nice properties when the h_i() are chosen 
randomly, but we probably only want to specify a fixed number and reuse 
them cyclically for longer keys (as opposed to specifying an algorithm for 
defining an arbitrary h_i()).  It seems convenient to specify enough h_i() 
to handle a full block/entry's worth of name data -- the 32k of storage 
needed to hold all the hash function tables in memory is not excessively 
large.  This would also provide a convenient opportunity to apply the 
mod-8191 operation before overflow is possible.  Hmm, on an i7 that would 
even fit in L1 cache; maybe locality concerns are not particularly 
important.

Using a fixed set of repeating h_i() will mean that names which differ 
only by having a pair of characters swapped which are 128 positions apart 
will hash to the same value (and names which differ by arbitrarily many 
such swaps or permutations).  I am not particularly familiar with the 
structure of x500 names, but I hope that 128 is large enough for these 
congruencies to be effectively random.

-Ben