[OpenAFS-devel] Re: Cache corruption with RX busy code

chas williams - CONTRACTOR chas@cmf.nrl.navy.mil
Thu, 18 Apr 2013 07:49:28 -0400


On Sat, 13 Apr 2013 01:36:31 -0500
Andrew Deason <adeason@sinenomine.net> wrote:

> I'm not looking at the code at the moment, but don't we get the serial
> of the offending packet in the BUSY we receive? Therefore, we should be
> able to ignore a BUSY packet if it does not reference the most recent
> serial we've sent.

Even if we did, I think that a race between two new connections would
both have the same starting serial number (even if the starting serial
numbers were random, it would just make the problem more unlikely but
still possible).

> I'm not really checking myself here and looking rather quickly, so I may
> be remembering stuff entirely incorrectly. But if any of all that makes
> any sense, eliminating such errors is impossible and we just need to
> discard cache data for all uncertain cases.

This does seem to be the case.  RX doesn't have a three way handshake
like TCP so I don't think this race is fixable without a protocol
change.