[OpenAFS] Delete on large directory tree causes client lockup
Moritz Bechler
mbechler@eenterphace.org
Sun, 06 Apr 2008 23:03:49 +0200
Hi,
when deleting a somewhat larger (the reproduce case I use at the moment
is an openafs source tree but that already happened on considerably
smaller trees as well) directory from afs we experience lockups of the
complete afs client (all other fs calls on the client start to block) -
which is kind of bad as our home directories are stored on /afs.
We are using openafs-1.5.33 on Gentoo Linux (Kernel 2.6.22) at the
moment but used 1.4.6 some time before (and hoped to fix that problem by
upgrading). I've not tested it recently but when testing the windows it
seemed that the same/a similar problem existed there too.
- cmdebug -long <host> seems to be unable to communicate with the
cachemanager
- we can't get the client to produce debug output (-debug/-logfile does
not seem to do anything)
- strace on rm shows unlinkat() blocking
- it seems that the removal gets slower at first but finally locks up
(once left it running for a day - nothing happened)
- a filtered pcap can be found at
http://mbechler.eenterphace.org/afs/afs.pcap - as well as a verbose
fileserver log (at http://mbechler.eenterphace.org/afs/fs.log ) of the
testing timespan. The last packet that seems regular can be found at 5264.
if we can provide further debugging information we are happy to do so.
with best regards
Moritz Bechler