[AFS3-std] rxgk: Rekeying

Tom Keiser tkeiser@sinenomine.net
Wed, 14 Oct 2009 20:46:09 -0400


On Wed, Oct 14, 2009 at 7:04 PM, Chaskiel Grundman <cg2v@andrew.cmu.edu> wr=
ote:
>
>> By "application layer" - do you mean the application (ie, fileserver,
>> ptserver) instead of the security mechanism? =A0I can't speak for the
>> rxgk side of things, but speaking from the rxk5 side of things this
>> is something I was specifically seeking to avoid: application/mechanism
>> specific glop. =A0Or, in a world, "yuck". =A0But, ignoring that problem.=
..
>
> Well, more like the client application (cache manager, ubik). When the ke=
y
> expires, all calls on that connection start to fail, you throw it away, a=
nd
> call rx_NewConnection, which gets you a new cid, which means you get a ne=
w
> TK. The fileserver has some code for this, but I think it never gets
> triggered (i.e. it's leftover from the bcrypt days)
>
>> From the rx side of things, there is a problem: there may be handling
>> multiple
>> calls, they may be in different stages of completion, and there is no
>> guarantee on how long any given call might last. =A0On the server side, =
it's
>> not possible to know what calls the client might have sent, are in-fligh=
t,
>> but haven't been received on the server. =A0This means there is no
>> logical "good" place timing-wise for the server to return "VICETOKENDEAD=
".
>> Waiting for idle moments on a connection on the server may not be possib=
le
>> for a sufficiently busy connection. =A0Adding slop in will reduce proble=
ms,
>> but not eliminate them.
>
> Once you decide the key is expired, all in progress rpcs start failing in
> _CheckPacket, as if the token was expired (but with an error code that
> causes an immediate retry)
>
>> idempotent. =A0And, of course, if a bulk data write is larger than the
>> "byte window" in rxgk, things get worse.
>
> I had hoped to avoid additional complexity, but that's a good reason to
> either do rekeying, or make the byte life more advisory (i.e. the transpo=
rt
> doesn't enforce it at all, but the app enforces it at the begining of an =
rpc
> if it wants to, as src/viced/host.c:GetClient() redundantly does for toke=
n
> lifetimes).

I'm not sure advisory byte life is going to be satisfactory for all
rxgk use cases.  Consider the afs3 volume dump and forward RPCs, which
treat the split interface as a long-lived bulk transport.  We know
that there are sites which routinely move several terabytes of data
under a single split call, over periods of many hours.  Given the
extremely large amounts of data being transported, and the relatively
long time periods over which a split call can exist, I think we would
be remiss to ignore the rekeying problem.

-Tom