[OpenAFS] out-of-sync files in client cache (1.4.0)

Peter Somogyi psomogyi@gamax.hu
Thu, 20 Apr 2006 14:43:54 +0200


We have a problem at customer side (in production env.): we have a SLES9, 
kernel 2.6.5-7.193-smp, 586.
Every openafs clients & servers are 1.4.0.

Symptom: when a client creates/deletes/modifies any file in a given directory, 
it sees the changes, but a small portion of the other clients don't see any 
And those problematic clients stay 2-3 hours in this state of "out-of-sync".
The "small portion" means that 1-4 clients of the 20-40 don't see the changes.
/But these clients are alive, mainly because when I write a file on these 
clients, the changes can be seen on both the actual and the other "good" 

I've looked at the tcpdumps, and the problem is that somehow the server 
doesn't send "Operation callback(204)" to these (1-4) problematic client(s) 
when a file or a directory changes. (But the other clients are notified 

Have anybody met the same problem? We appreciate any suggestions/ideas/help.

Client config:
-stat 50000 -dcache 4200 -daemons 6 -volumes 256 -nosettime -chunksize 
17 -rxpck 1500

Fileserver config: (BosConfig)
restarttime 16 0 0 0 0
checkbintime 3 0 5 0 0
bnode fs fs 0
parm /usr/lib/openafs/fileserver -pctspare 10 -L -udpsize 1310720 -nojumbo
-abortthreshold 0 -busyat 1800
parm /usr/lib/openafs/volserver -p 16 -syslog -udpsize 1310720 -nojumbo
parm /usr/lib/openafs/salvager -parallel 4 -syslog -DontSalvage

- the server is heavily loaded (both by CPU and memory; some other heavy apps 
are running there, too)
- I couldn't find any exceptional messages in logs
- reproducability: it occures sporadicly in prod. env; in every 1-3 days for a 
few hours

If someone is interested in the details (tcpdumps/log files/configs) I can 
send some, but first I may have to ask permission from our customer to 
send/forward or request them, and this may take time.

