[OpenAFS-devel] "Lost contact with file server" problems
Jeffrey Hutzelman
jhutz@cmu.edu
Fri, 26 Aug 2005 16:35:58 -0400
On Monday, August 22, 2005 16:52:29 -0400 Jeffrey Altman
<jaltman@secure-endpoints.com> wrote:
> I'm sure there is code in the client that identifies expired tokens
> and removes them. I just don't believe that code is associated in
> any way with the code that processes RXKADEXPIRED errors.
Well, I don't know what strangeness you might have in the Windows client.
The traditional client _does_ discard a user's tokens when it gets any
authentication error, including RXKADEXPIRED.
> I'm also suspicious of why the server has no code that specifically
> addresses RXKADEXPIRED errors if the client is allowed to send them
> to the server.
The client isn't specifically sending RXKADEXPIRED. It is sending an abort
because it received a packet on a connection that is in error. Such
aborts, whether sent by the client or server, _always_ contain the error
code corresponding to the current error on the call.
The server doesn't need to _do_ anything special in response to this
particular error. It just needs to propagate the error back up the call
chain, which it does, so that whatever procedure is handling this call gets
an error on its next rx_Write or whatever and aborts. This is all
perfectly normal.
Now, as Derrick noted, the RXKADEXPIRED is in fact not originating in the
client, but in the _server_; the connection is in error because an abort on
that connection was received two or three minutes earlier with an error
code of RXKADEXPIRED.
The confusing thing is, once the connection is in error, why is the client
ever sending a new request to the server? The answer appears to be that
rx_NewCall on a connection in error does not fail (not surprising; IIRC the
assumption is that rx_NewCall always succeeds), but also does not propagate
the connection's error state down to the call. IMHO this is a bug.
If this is in fact the problem, I believe the patch below will make the
client notice the error condition on the newly-created call. There is
still some question as to why the client did not react to the RXKADEXPIRED
received in response to its _previous_ call. Of course, there's _also_ the
question as to why there was such a huge latency between the data packet on
that call and the resulting abort.
-- Jeff
Index: rx.c
===================================================================
RCS file: /cvs/openafs/src/rx/rx.c,v
retrieving revision 1.83
diff -u -r1.83 rx.c
--- rx.c 19 Aug 2005 19:20:44 -0000 1.83
+++ rx.c 26 Aug 2005 20:31:19 -0000
@@ -1146,7 +1146,12 @@
/* Client is initially in send mode */
call->state = RX_STATE_ACTIVE;
- call->mode = RX_MODE_SENDING;
+ if (conn->error) {
+ call->mode = RX_MODE_ERROR;
+ call->error = conn->error;
+ } else {
+ call->mode = RX_MODE_SENDING;
+ }
/* remember start time for call in case we have hard dead time limit */
call->queueTime = queueTime;