[OpenAFS-devel] OOPS of OpenAFS 1.4.4 on Linux 2.6.18

Chaskiel M Grundman cg2v@andrew.cmu.edu
Mon, 14 May 2007 15:58:55 -0400


--On Monday, May 14, 2007 05:15:53 PM +0200 Erland Lewin <erland@lewin.nu> 
wrote:

> sol kernel: kernel BUG at
> /usr/src/afs/openafs-1.4.4/src/libafs/MODLOAD-2.6.18-SP/afs_dcache.c:2395!

There was another report of a crash in that part of the code a few weeks 
ago. Was your cache partition full?


analysis and possible fixes:

The comment above this block says:

        /* now, if code != 0, we have an error and should punt.
         * note that we have the vcache write lock, either because
         * !setLocks or slowPass.
         */
        if (code) {

it turns out that this is not the case for a dynroot vcache, since the 
dynroot codepath does not retry with slowPass=1 on error conditions. I have 
two proposed possible fixes for this. One runs dynroot fetches with 
slowPass=1, since they don't have to wait for network I/O (and so won't be 
holding the write lock across a "slow" network operation). The other 
assumes that dynroot vcaches don't need all the same callback processing as 
normal vcaches and can get away with not getting a write lock in the error 
case.

Patch #1:

--- src/afs/afs_dcache.c        2007-05-14 12:57:29.000000000 -0400
+++ src/afs/afs_dcache.c   2007-05-14 12:10:28.000000000 -0400
@@ -1545,6 +1545,8 @@
     setNewCallback = setVcacheStatus = 0;

     if (setLocks) {
+       if (afs_IsDynroot(avc))
+            slowPass = 1;
        if (slowPass)
            ObtainWriteLock(&avc->lock, 616);
        else


Patch #2

--- src/afs/afs_dcache.c        2007-05-14 12:57:29.000000000 -0400
+++ src/afs/afs_dcache.c   2007-05-14 15:50:02.000000000 -0400
@@ -2382,17 +2382,19 @@
            }
            ReleaseWriteLock(&tdc->lock);
            afs_PutDCache(tdc);
-           ObtainWriteLock(&afs_xcbhash, 454);
-           afs_DequeueCallback(avc);
-           avc->states &= ~(CStatd | CUnique);
-           ReleaseWriteLock(&afs_xcbhash);
-           if (avc->fid.Fid.Vnode & 1 || (vType(avc) == VDIR))
-               osi_dnlc_purgedp(avc);
-           /*
-            * Locks held:
-            * avc->lock(W); assert(!setLocks || slowPass)
-            */
-           osi_Assert(!setLocks || slowPass);
+           if (!afs_IsDynroot(avc)) {
+               ObtainWriteLock(&afs_xcbhash, 454);
+               afs_DequeueCallback(avc);
+               avc->states &= ~(CStatd | CUnique);
+               ReleaseWriteLock(&afs_xcbhash);
+               /*
+                * Locks held:
+                * avc->lock(W); assert(!setLocks || slowPass)
+                */
+               osi_Assert(!setLocks || slowPass);
+               if (avc->fid.Fid.Vnode & 1 || (vType(avc) == VDIR))
+                   osi_dnlc_purgedp(avc);
+           }
            tdc = NULL;
            goto done;
        }