[OpenAFS-devel] Re: idle dead timeout processing in clients

Andrew Deason adeason@sinenomine.net
Wed, 30 Nov 2011 13:34:18 -0600


On Wed, 30 Nov 2011 14:06:34 -0500
Jeffrey Altman <jaltman@secure-endpoints.com> wrote:

> > I'm not sure if we're talking about completely different things or
> > what.  The afs_BlackListOnce code exists in (shipping) 1.4 and, I
> > mean, it certainly gets _called_. If I insert a sleep(10000) into
> > the FetchStatus handler, the client will give an error (or failover
> > to another site, etc); it won't just hang forever on the request.
> 
> That is not a valid simulation for this case.  Idle dead timeouts
> occur when keepalives are being received but no actual data.

...and how does the above not do that? Rx keepalives still go across the
wire; it just blocks the one thread, so it's just the RPC itself that
doesn't make progress. rxevents and such will still fire.

And I still don't understand what the purported purpose of
afs_BlackListOnce is, if not for handling transient token errors and
idle errors. We see it get hit all the time (usually for the latter
reason); bugs have been fixed in it during the life of 1.4.

> > And this exists in 1.4 rxi_CheckCall:
> > 
> >     /* see if we have a non-activity timeout */
> >     if (call->startWait && conn->idleDeadTime
> >         && ((call->startWait + conn->idleDeadTime) < now) &&
> >         (call->flags & RX_CALL_READER_WAIT)) {
> >         if (call->state == RX_STATE_ACTIVE) {
> >             rxi_CallError(call, RX_CALL_TIMEOUT);
> >             return -1;
> >         }
> >     }
> 
> Notice the RX_CALL_READER_WAIT check.  This is for server side
> processing.  It is used by viced.  That code should remain.

I'm not sure if you're trying to say that the above code block is only
executed on viced...? There's nothing viced- or server-specific about
RX_CALL_READER_WAIT being set; clients can wait on the net to read data,
too.

This is also only one of about two (I think) places where we generate an
RX_CALL_TIMEOUT error; the other being hard-dead timeouts.
afs_BlackListOne processing only occurs for RX_CALL_TIMEOUT errors (and
some rxkad stuff), and hard-dead timeouts are not enabled on non-VL
conns, so...

-- 
Andrew Deason
adeason@sinenomine.net