[OpenAFS-devel] Re: Cache corruption with RX busy code

Andrew Deason adeason@sinenomine.net
Thu, 18 Apr 2013 16:39:36 -0500


On Thu, 18 Apr 2013 13:39:39 -0400
Jeffrey Altman <jaltman@your-file-system.com> wrote:

> My gut reaction is that the additional expense both in terms of round
> trip time and server load is not worth the optimization of immediately
> retrying a call that received BUSY on another channel.

Yes, I'd be okay with this. I think the primary motivation for the
original change was really to avoid the behavior of retrying forever
when a busy call channel doesn't go away. We can still have that by
using the 'lastBusy' processing for new calls, but remove the parts to
error out on BUSY receipt immediately on existing calls.

Waiting for a normal timeout due to a long-running busy channel doesn't
seem unreasonable, since they should be rare. Conceivably we could have
a shorter timeout specifically for BUSY channels, but that's probably
not worth worrying about. If you're experiencing these all the time,
then you've got a problem that you should be looking into.

We may need a change to guarantee that we actually hit such a timeout
(if we agree that is actually desired); previously I feel like we may
have been depending on idledead/harddead to get any timeout at all... I
may be wrong about that.

-- 
Andrew Deason
adeason@sinenomine.net