[OpenAFS-devel] Linux 2.6.12 kernel BUG at fs/namei.c:1189

Rainer Toebbicke rtb@pclella.cern.ch
Fri, 20 Jan 2006 10:49:04 +0100


This is a multi-part message in MIME format.
--------------070903070801010705000403
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

If we're talking UP-kernels, please ignore my pointless intrusion into
this discussion as my observations are about locking - otherwise:

There is a bug in the handling of inode-cleanup (afs_put_inode) -
prior to 1.4.0 systems would be likely to deadlock if it happened at
the "right" time, then a patch (put-inode-speedup-20050815) went in
that removed the lock rather abruptly and did not really consider
whether one was required.

We discussed this back mid-November. At the time we ran into the issue
here during 1.4.0+ stress tests and I started fixing it as far as I
understood. I now see that later
linux-afs-put-inode-dont-race-20051128 puts the lock back - however in
my opinion it remains wrong (see the comments in November).

I attach a patch from that epoch that we've been testing now for 2
months successfully, but it still contains "hack" comments and printks
which make it unsuitable for mainstream. The patch is against
something like 1.4.1-rc1.

The issue: when running out of memory the (Linux) kernel cleans up
inodes/dentries, usually through kswapd. In that process afs code gets
upcalled and should handle its share. Unfortunately at that point it
is not easy to predict what locks are held, e.g. simply AFS_GLOCK()ing
can already hang the system. The patch currently very brutally avoids
doing anything at certain moments when the current process is [one of
the] kswapd[s], or when we cannot acquire the global lock. In any case
however it makes sure that the periodic afs_vcache cleanup catches the
cases that have been dropped because the context was not favourable.

I'm pretty sure the put-inode-speedup-20050815 delta favours panics as
that's how I noticed it. I don't remember however if it is in 1.4.0.
Hangers involving kswapd are possible in all scenarios and are (modulo
limited experience up to now) avoided with this patch.


Harald Barth wrote:
> Sorry, I have to bring back this thread from the dead. I have a user that
> had me run into the same BUG.
> 
....
> 
>>We *did* run into something that looked like this problem on another Linux
>>system here with files that weren't mount points, but we could never
>>manage to reproduce it and never got BUG output, so that may or may not be
>>the same problem.  :/
> 
> 
> BUG output and the user say he tried to do a rm file and then all
> access to that directory hung. The thing that puzzles me is that
> normally people don't use the /afs/.pdc.kth.se/home/$USER way through
> the RO to come to their RW home. I have not been able to reproduce it
> that way either. In spite of ~1000 machines with OpenAFS, the bug has
> happened to me only twice, once in October and now some days ago. Not
> precisely a lot of data to debug. Not that I'd know how to validate
> the cached dentries anyway. The client version is 1.4.0 btw.
> 

--------------070903070801010705000403
Content-Type: text/plain;
 name="patch_Linux_deadlocks"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="patch_Linux_deadlocks"

--- openafs/src/afs/LINUX/osi_machdep.h.5rig	2005-07-11 21:29:56.000000000 +0200
+++ openafs/src/afs/LINUX/osi_machdep.h	2005-11-15 17:34:37.000000000 +0100
@@ -202,6 +202,8 @@
 	 afs_global_owner = current->pid; \
 } while (0)
 
+#define AFS_TRY_GLOCK() ( down_trylock(&afs_global_lock) ? 0 : (afs_global_owner = current->pid) )
+
 #define ISAFS_GLOCK() (afs_global_owner == current->pid)
 
 #define AFS_GUNLOCK() \
--- openafs/src/afs/LINUX/osi_vnodeops.c.5rig	2005-11-15 15:31:21.000000000 +0100
+++ openafs/src/afs/LINUX/osi_vnodeops.c	2005-11-16 11:11:19.000000000 +0100
@@ -611,6 +611,15 @@
     return;
 }
 
+
+/* <HACK> system low on memory, somebody with AFS_GLOCK triggers wakeup_kswapd and waits;
+   kswapd calls prune_dcache which calls this routine -> deadlock on AFS_GLOCK
+*/
+static char afs_kswapd[6] = "kswapd";	/* without nullchar, matches 'kswapd.*' */
+#define AFS_CLEANUP_HACK(x) ( (memcmp(current->comm, afs_kswapd, sizeof(afs_kswapd))) ? \
+    0 : printk("AFS: %s:%d near deadlock miss!\n", __FILE__, __LINE__))
+/* </HACK> */
+
 /* afs_linux_revalidate
  * Ensure vcache is stat'd before use. Return 0 if entry is valid.
  */
@@ -622,6 +631,8 @@
     cred_t *credp;
     int code;
 
+    if (AFS_CLEANUP_HACK()) return(0);
+
 #ifdef AFS_LINUX24_ENV
     lock_kernel();
 #endif
@@ -691,6 +702,8 @@
     struct vcache *vcp, *pvcp, *tvc = NULL;
     int valid;
 
+    if (AFS_CLEANUP_HACK()) return(0);
+
 #ifdef AFS_LINUX24_ENV
     lock_kernel();
 #endif
@@ -788,10 +801,14 @@
 {
     struct vcache *vcp = VTOAFS(ip);
 
-    AFS_GLOCK();
-    if (vcp->states & CUnlinked)
+    /* Careful about the global lock here: we might already hold it (and 
+       afs_InactiveVCache might even temporarily release it); or we could be
+       kswapd and the global_owner is waiting for us to clean up dentries */
+
+    if ((vcp->states & CUnlinked) && AFS_TRY_GLOCK()) {
 	(void) afs_InactiveVCache(vcp, NULL);
-    AFS_GUNLOCK();
+	AFS_GUNLOCK();
+    } /* else we trust afs_FlushActiveVcaches to clean up! */
 
     iput(ip);
 }
--- openafs/src/afs/LINUX/osi_vfsops.c.5rig	2005-10-13 20:08:40.000000000 +0200
+++ openafs/src/afs/LINUX/osi_vfsops.c	2005-11-15 17:33:29.000000000 +0100
@@ -343,12 +343,15 @@
 {
     struct vcache *vcp = VTOAFS(ip);
 
-    if (VREFCOUNT(vcp) == 2) {
-	AFS_GLOCK();
+    if (VREFCOUNT(vcp) == 2 && AFS_TRY_GLOCK()) {
+	ObtainWriteLock(&vcp->lock, 562);
 	if (VREFCOUNT(vcp) == 2)
 	    afs_InactiveVCache(vcp, NULL);
+	ReleaseWriteLock(&vcp->lock);
 	AFS_GUNLOCK();
     }
+    /* if we could not obtain the global lock, we rely
+       on afs_FlushActiveVcaches to clean this up */
 }
 
 /* afs_put_super
--- openafs/src/afs/afs_vcache.c.5rig	2005-11-09 12:38:46.000000000 +0100
+++ openafs/src/afs/afs_vcache.c	2005-11-16 10:33:21.000000000 +0100
@@ -956,7 +956,9 @@
 #if defined(AFS_OSF_ENV) || defined(AFS_LINUX22_ENV)
     /* Hold it for the LRU (should make count 2) */
     VN_HOLD(AFSTOV(tvc));
+#define UNUSED_VNODE_REFCOUNT 1
 #else /* AFS_OSF_ENV */
+#define UNUSED_VNODE_REFCOUNT 0
 #if !(defined (AFS_DARWIN_ENV) || defined(AFS_XBSD_ENV))
     VREFCOUNT_SET(tvc, 1);	/* us */
 #endif /* AFS_XBSD_ENV */
@@ -1081,6 +1083,7 @@
     tvc->states &=~ CVInit;
     afs_osi_Wakeup(&tvc->states);
 
+if (VREFCOUNT(tvc) != UNUSED_VNODE_REFCOUNT+1) { static int xxx=0; if (xxx++ < 20) printf("%s:%d: new vnode refcount = %ld\n", __FILE__, __LINE__, VREFCOUNT(tvc)); }
     return tvc;
 
 }				/*afs_NewVCache */
@@ -1153,7 +1156,7 @@
 #endif
 	    }
 	    didCore = 0;
-	    if ((tvc->states & CCore) || (tvc->states & CUnlinkedDel)) {
+	    if (tvc->states & (CCore | CUnlinkedDel | CUnlinked)) {
 		/*
 		 * Don't let it evaporate in case someone else is in
 		 * this code.  Also, drop the afs_xvcache lock while
@@ -1209,6 +1212,10 @@
 		    AFS_RWLOCK((vnode_t *) tvc, VRWLOCK_WRITE);
 #endif
 		} else {
+		    /* Unused: refcount = 1 (Linux, e.g. after quitting afs_put_inode, OSF?),
+		       0 elsewhere. Compare with 2 as we just increased it by one */
+		    if ( (tvc->states & CUnlinked) && !VREFCOUNT_GT(tvc, UNUSED_VNODE_REFCOUNT+1) )
+			afs_InactiveVCache(tvc, NULL);
 		    /* lost (or won, perhaps) the race condition */
 		    ReleaseWriteLock(&tvc->lock);
 #ifdef AFS_BOZONLOCK_ENV


--------------070903070801010705000403--