[OpenAFS-devel] prdb format extension for extended authentication
names
Benjamin Kaduk
kaduk@MIT.EDU
Fri, 17 May 2013 19:32:30 -0400 (EDT)
On Fri, 17 May 2013, Benjamin Kaduk wrote:
> On Fri, 17 May 2013, Simon Wilkinson wrote:
>
>> hash to be used to for the authentication name hash table isn't specified.
>
> Reusing the old NameHash is probably not a good choice. I'll think about
> think about this, too.
Consulting Knuth, it sounds like the state of the art is a pretty old
solution -- for a multiword/multicharacter key, use
h(K) = (h_1(x_1) + h_2(x_2) + ... + h_n(x_n)) mod M
where each h_i() applies to word i and is a hash function independent
of the other h_j() (for us, M should probably remain 8191).
We could use either 8- or 32-bit words (since our data structure is
inherently 32-bit word-aligned), but he notes that using characters allows
the hash to be just a lookup table.
That formulation has very nice properties when the h_i() are chosen
randomly, but we probably only want to specify a fixed number and reuse
them cyclically for longer keys (as opposed to specifying an algorithm for
defining an arbitrary h_i()). It seems convenient to specify enough h_i()
to handle a full block/entry's worth of name data -- the 32k of storage
needed to hold all the hash function tables in memory is not excessively
large. This would also provide a convenient opportunity to apply the
mod-8191 operation before overflow is possible. Hmm, on an i7 that would
even fit in L1 cache; maybe locality concerns are not particularly
important.
Using a fixed set of repeating h_i() will mean that names which differ
only by having a pair of characters swapped which are 128 positions apart
will hash to the same value (and names which differ by arbitrarily many
such swaps or permutations). I am not particularly familiar with the
structure of x500 names, but I hope that 128 is large enough for these
congruencies to be effectively random.
-Ben