[OpenAFS] Delete on large directory tree causes client lockup

Moritz Bechler mbechler@eenterphace.org
Sun, 06 Apr 2008 23:03:49 +0200


Hi,

when deleting a somewhat larger (the reproduce case I use at the moment 
is an openafs source tree but that already happened on considerably 
smaller trees as well) directory from afs we experience lockups of the 
complete afs client (all other fs calls on the client start to block) - 
which is kind of bad as our home directories are stored on /afs.

We are using openafs-1.5.33 on Gentoo Linux (Kernel 2.6.22) at the 
moment but used 1.4.6 some time before (and hoped to fix that problem by 
upgrading). I've not tested it recently but when testing the windows it 
seemed that the same/a similar problem existed there too.

- cmdebug -long <host> seems to be unable to communicate with the 
cachemanager
- we can't get the client to produce debug output (-debug/-logfile does 
not seem to do anything)
- strace on rm shows unlinkat() blocking
- it seems that the removal gets slower at first but finally locks up 
(once left it running for a day - nothing happened)
- a filtered pcap can be found at 
http://mbechler.eenterphace.org/afs/afs.pcap - as well as a verbose 
fileserver log (at http://mbechler.eenterphace.org/afs/fs.log ) of the 
testing timespan. The last packet that seems regular can be found at 5264.

if we can provide further debugging information we are happy to do so.

with best regards

Moritz Bechler