[OpenAFS] Strange caching failures

Stephan Wonczak a0033@rrz.uni-koeln.de
Thu, 28 Feb 2013 15:57:03 +0100 (CET)


   Hi Derrick!

On Thu, 28 Feb 2013, Derrick Brashear wrote:

> start with FileLog on the fileserver and see what errors for B (or D 
> or...) at around the time it breaks.

   Unfortunately nothing jumps out. A colleague managed to pinpoint a 
failure to a period of five minutes, but nothing was logged on the 
fileserver. The only remotely suspicious message(s) are of the type

Feb 28 14:45:03 vmfs05 fileserver[4346]: FindClient: stillborn client 
ac460e20(e8d98fa0); conn ac406ba0 (host XX.XX.XX.XX:7001) had client 
a718bf0(e8d98fa0)

but according to older posts this is a harmless, informative message. 
Additionally, I have seen these messages for years, and they did not 
coincide with the pinpointed incident.
   Maybe jacking up the loglevel might help?

   Stephan

>
> Derrick
>
>
> On Feb 28, 2013, at 9:39, Stephan Wonczak <a0033@rrz.uni-koeln.de> wrote:
>
>>  Hi all!
>>  for the past few weeks, we are struck with a very weird behavior regarding cache updates of AFS clients. It looks like sometimes the callback does not work and one client is stuck with an older version of the file in question. Example:
>>
>>  Write to file 'foo' on client A every five minutes.
>>  Clients B,C and D dutifully update their caches and see the updates
>>  After some time, suddenly Client B dows not see the updates any more, while clients C and D continue working fine.
>>  A 'fs flush foo' on client B corrects the problem.
>>  Other files are *NOT* affected and are updated fine on Client B.
>>
>>  This behavior is not really repeatable, though, sometimes it is client D that stops working, or any other client.
>>  When taking into account that the clients I am talking about here are web servers, you can imagine that this behavior is less than desirable.
>>
>>  Now for the versions:
>>  Clients are a mix of 1.4.x and 1.6.x (mainly 1.6.1-1.el5 and 1.4.12-el5, with other versions thrown into the mix).
>>  Servers are version 1.4.11-el5.
>>  OS on both clients and server is RHEL5
>>
>>  It might be a coincidence, but we became aware of this problem shortly after updating a bunch of clients to 1.6.1-1.el5.
>>
>>  Any ideas on how to go about debugging this?
>>
>>    Dipl. Chem. Dr. Stephan Wonczak
>>
>>        Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
>>        Universitaet zu Koeln, Weyertal 121, 50931 Koeln
>>        Tel: +49/(0)221/470-89583, Fax: +49/(0)221/470-89625
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>

 	Dipl. Chem. Dr. Stephan Wonczak

         Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
         Universitaet zu Koeln, Weyertal 121, 50931 Koeln
         Tel: +49/(0)221/470-89583, Fax: +49/(0)221/470-89625