[OpenAFS-devel] patch against deadlock

Hartmut Reuter reuter@rzg.mpg.de
Wed, 22 Feb 2006 12:29:58 +0100


I saw a deadlock between a "ls" and a "rm" command on our Regatta AIX 
5.2 system which I could analyze using kdb.

It turned out that the lock order was violated in afs_vnop_remove.c:
"rm" held a dcache lock when trying to obtain a vcache lock.
"ls" held a read lock on the vcache and tried to get the dcache lock.

--- afs_vnop_remove.c.orig      2005-05-30 06:05:44.000000000 +0200
+++ afs_vnop_remove.c   2006-02-21 16:05:06.000000000 +0100
@@ -349,6 +349,8 @@
      if (tvc && osi_Active(tvc)) {
         /* about to delete whole file, prefetch it first */
         ReleaseWriteLock(&adp->lock);
+       if (tdc)
+           ReleaseSharedLock(&tdc->lock);
         ObtainWriteLock(&tvc->lock, 143);
  #if    defined(AFS_OSF_ENV)
         afs_Wire(tvc, &treq);
@@ -357,6 +359,8 @@
  #endif
         ReleaseWriteLock(&tvc->lock);
         ObtainWriteLock(&adp->lock, 144);
+        if (tdc)
+           ObtainSharedLock(&tdc->lock, 1638);
      }

      osi_dnlc_remove(adp, aname, tvc);


This diff applies to OpenAFS 1.4.0. The number 1638 should remember to 
the number 638 where the lock was obtained before.

Hartmut
-----------------------------------------------------------------
Hartmut Reuter                           e-mail reuter@rzg.mpg.de
					   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------