[OpenAFS-devel] "Lost contact with file server" problems

Roland Kuhn rkuhn@e18.physik.tu-muenchen.de
Mon, 22 Aug 2005 13:23:35 +0200 (CEST)


Hi Jeffrey!

On Mon, 22 Aug 2005, Jeffrey Altman wrote:

> Roland Kuhn wrote:
>> Hi folks!
>>
>> On Sun, 21 Aug 2005, Derrick J Brashear wrote:
>>
>>> it needs to include the first error packet, e.g. the window where it
>>> loses contact, to be useful
>>>
>> Okay, it happened again, and I have a full trace:
>>
>> http://www.e18.physik.tu-muenchen.de/~rkuhn/openafs-fail-trace.cap
>> http://www.e18.physik.tu-muenchen.de/~rkuhn/openafs-fail-trace-end.cap
>>
>> The latter contains only the last 81 frames and begins a few frames
>> before the request which fails. The former is 10MB in size. If you need
>> more history, I also have the last 1GB of the connection available.
>> 192.168.18.2 is the server, 192.168.18.39 the client. The access is for
>> big files typically.
>>
>> Ciao,
>>                     Roland
>
> The Abort code is RXKADEXPIRED (19270409L).   Would you verify that you
> still have a valid token and that your system clocks are in sync?
>
The clocks are perfectly synchronized and I'm pretty sure that the batch 
jobs have valid tokens, otherwise I would see other failures as well. 
Also, wouldn't it be very nasty to effectively disable a complete client 
because one connection has no valid token?

The other thing is: it is the _client_ which sends the first ABORT in 
response to a challenge....

Ciao,
 					Roland