[OpenAFS] Strange caching failures
Stephan Wonczak
a0033@rrz.uni-koeln.de
Thu, 28 Feb 2013 15:37:29 +0100 (CET)
Hi all!
for the past few weeks, we are struck with a very weird behavior
regarding cache updates of AFS clients. It looks like sometimes the
callback does not work and one client is stuck with an older version of
the file in question. Example:
Write to file 'foo' on client A every five minutes.
Clients B,C and D dutifully update their caches and see the updates
After some time, suddenly Client B dows not see the updates any more,
while clients C and D continue working fine.
A 'fs flush foo' on client B corrects the problem.
Other files are *NOT* affected and are updated fine on Client B.
This behavior is not really repeatable, though, sometimes it is client D
that stops working, or any other client.
When taking into account that the clients I am talking about here are
web servers, you can imagine that this behavior is less than desirable.
Now for the versions:
Clients are a mix of 1.4.x and 1.6.x (mainly 1.6.1-1.el5 and 1.4.12-el5,
with other versions thrown into the mix).
Servers are version 1.4.11-el5.
OS on both clients and server is RHEL5
It might be a coincidence, but we became aware of this problem shortly
after updating a bunch of clients to 1.6.1-1.el5.
Any ideas on how to go about debugging this?
Dipl. Chem. Dr. Stephan Wonczak
Regionales Rechenzentrum der Universitaet zu Koeln (RRZK)
Universitaet zu Koeln, Weyertal 121, 50931 Koeln
Tel: +49/(0)221/470-89583, Fax: +49/(0)221/470-89625