[OpenAFS-devel] appears that cvs trunk has a deadlock problem of some sort for linux24

Nathan Neulinger nneul@umr.edu
Thu, 4 Apr 2002 12:52:53 -0600


--OXfL5xGRrasGEqWY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I applied the attached patch (debugging) and it shifts the problem. Basically, I wind 
up with:

looping in NEWVC for cur
about to gunlock
about to dunlock
about to d_drop
about to dput
about to glock
about to restart
about to dlock
about to dunlock
	HANG

It would lock up much earlier in TFDC without the patch.

I'm not sure how that hang can be occurring unless you can't DUNLOCK when GLOCK is held. 

Would appreciate it if someone with more knowledge of what this code is supposed to do
could look it over. 

BTW, In case it matters, I've seen the hang on both a UP and SMP 2.4.18 systems, both are 
running the SMP kernel and MP libafs build though.

-- Nathan

On Thu, Apr 04, 2002 at 11:41:59AM -0600, Neulinger, Nathan wrote:
> Yes, it's hanging in a dlock. My tracing indicates:
> 
> about to TFDC
> about to DLOCK in tfdc
> 
> and that's where it hangs.
> 
> Based on the comments above TryFlush, the code of TF doesn't make sense.
> It says it maintains the vcache lock exclusively and that it is called
> with that lock held. Well, the first thing it does is try to lock it
> again. Additionally, shortly after it exits, the caller will also try to
> DUNLOCK(). Looks to me like the code needs to just DUNLOCK()..DLOCK() in
> the loop in TF, and get rid of the outer DLOCK/DUNLOCK. I'm trying
> something along those lines right now. 
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       EMail:  nneul@umr.edu
> University of Missouri - Rolla         Phone: (573) 341-4841
> Computing Services                       Fax: (573) 341-4216
> 
> 
> > -----Original Message-----
> > From: Neulinger, Nathan 
> > Sent: Thursday, April 04, 2002 11:30 AM
> > To: openafs-devel@openafs.org
> > Subject: RE: [OpenAFS-devel] appears that cvs trunk has a 
> > deadlock problem of some sort for linux24
> > 
> > 
> > Yep. Definately hanging in the TryFlush... I temporarily commented out
> > the call to it and problem went away. 
> > 
> > I'm willing to bet one of those DLOCK()'s is spinning. 
> > 
> > Something else that looks odd to me:
> > 
> > In TryFlush:
> > 
> >        if (!DCOUNT(dentry) && !dentry->d_inode) {
> >             DGET(dentry);
> >             AFS_GUNLOCK();
> >             DUNLOCK();
> > 
> > but in newVCache:
> > 
> >                         if (DCOUNT(dentry)) {
> >                             afs_TryFlushDcacheChildren(dentry);
> >                         }
> > 
> >                         if (!DCOUNT(dentry)) {
> >                             AFS_GUNLOCK();
> >                             DGET(dentry);
> >                             DUNLOCK();
> >                             d_drop(dentry);
> >                             dput(dentry);
> >                             AFS_GLOCK();
> >                             goto restart;
> > 
> > Notice how in NVC it does the dget after the gunlock? Why 
> > isn't TryFlush
> > doing it the same way?
> > 
> > I'd bet though that it's one of the DLOCK() calls, I'm trying to trace
> > it out now... 
> > 
> > -- Nathan
> > 
> > ------------------------------------------------------------
> > Nathan Neulinger                       EMail:  nneul@umr.edu
> > University of Missouri - Rolla         Phone: (573) 341-4841
> > Computing Services                       Fax: (573) 341-4216
> > 
> > 
> > > -----Original Message-----
> > > From: Neulinger, Nathan 
> > > Sent: Thursday, April 04, 2002 10:34 AM
> > > To: openafs-devel@openafs.org
> > > Subject: RE: [OpenAFS-devel] appears that cvs trunk has a 
> > > deadlock problem of some sort for linux24
> > > 
> > > 
> > > Initial glance, it looks like the TryFlush... routine is 
> > > called inside a
> > > DLOCK(), and itself does a DLOCK(). Not sure if that is 
> > > kosher or not. 
> > > 
> > > -- Nathan
> > > 
> > > ------------------------------------------------------------
> > > Nathan Neulinger                       EMail:  nneul@umr.edu
> > > University of Missouri - Rolla         Phone: (573) 341-4841
> > > Computing Services                       Fax: (573) 341-4216
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Neulinger, Nathan 
> > > > Sent: Thursday, April 04, 2002 10:15 AM
> > > > To: openafs-devel@openafs.org
> > > > Subject: [OpenAFS-devel] appears that cvs trunk has a 
> > > > deadlock problem of some sort for linux24
> > > > 
> > > > 
> > > > It's been introduced since 2002/03/26. Only changes I see 
> > > in the trunk
> > > > since then are the fake stat code, and the flush-dcache stuff. 
> > > > 
> > > > Basically, running my "crash afsd" script (yes, it's useful 
> > > > enough that
> > > > I keep a script around for some testing), runs through for a 
> > > > while, and
> > > > then machine completely locks up. No panic msg, nothing. Only 
> > > > thing that
> > > > responds is A-SysRQ-SUB. 
> > > > 
> > > > The script that I test with just does:
> > > > 
> > > > find /umr/s/openafs/ -follow -type f -print | xargs -P 8 -n 30 wc
> > > > 
> > > > Have not tried pulling out any of the dcache or fakestat 
> > > > changes to see
> > > > if reverting them helps. Will try and get more info soon 
> > > > unless someone
> > > > else spots problem first. (I assumed it was something with the
> > > > prototypes branch, but verified against build from the trunk.)
> > > > 
> > > > -- Nathan
> > > > 
> > > > ------------------------------------------------------------
> > > > Nathan Neulinger                       EMail:  nneul@umr.edu
> > > > University of Missouri - Rolla         Phone: (573) 341-4841
> > > > Computing Services                       Fax: (573) 341-4216
> > > > _______________________________________________
> > > > OpenAFS-devel mailing list
> > > > OpenAFS-devel@openafs.org
> > > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > > > 
> > > _______________________________________________
> > > OpenAFS-devel mailing list
> > > OpenAFS-devel@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > > 
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > 
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216

--OXfL5xGRrasGEqWY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="vc.diff"

Index: afs_vcache.c
===================================================================
RCS file: /cvs/openafs/src/afs/afs_vcache.c,v
retrieving revision 1.26
diff -u -r1.26 afs_vcache.c
--- afs_vcache.c	2002/04/02 05:09:52	1.26
+++ afs_vcache.c	2002/04/04 18:46:59
@@ -480,18 +480,22 @@
  repeat:
     next = this_parent->d_subdirs.next;
  resume:
-    DLOCK();
     while (next != &this_parent->d_subdirs) {
 	struct list_head *tmp = next;
 	struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
 
+afs_warnuser("loop for subdirs in TFDC\n");
+
 	next = tmp->next;
 	if (!DCOUNT(dentry) && !dentry->d_inode) {
 	    DGET(dentry);
 	    AFS_GUNLOCK();
+afs_warnuser("about to DUNLOCK in TFDC\n");
 	    DUNLOCK();
 	    d_drop(dentry);
 	    dput(dentry);
+afs_warnuser("about to DLOCK in TFDC\n");
+	    DLOCK();
 	    AFS_GLOCK();
 	    goto repeat;
 	}
@@ -499,20 +503,23 @@
 	 * Descend a level if the d_subdirs list is non-empty.
          */
 	if (!list_empty(&dentry->d_subdirs)) {
+afs_warnuser("descending in TFDC\n");
 	    this_parent = dentry;
 	    goto repeat;
 	}
     }
-    DUNLOCK();
 
     /*
      * All done at this level ... ascend and resume the search.
      */
     if (this_parent != parent) {
+afs_warnuser("ascending in TFDC\n");
 	next = this_parent->d_child.next;
 	this_parent = this_parent->d_parent;
 	goto resume;
     }
+
+afs_warnuser("done with TFDC\n");
 }
 #endif /* AFS_LINUX22_ENV */
 
@@ -640,28 +647,43 @@
 		    struct list_head *head = &ip->i_dentry;
 		    int all = 1;
 		restart:
+afs_warnuser("about to dlock\n");
 		    DLOCK();
 		    cur = head;
 		    while ((cur = cur->next) != head) {
 			struct dentry *dentry = list_entry(cur, struct dentry, d_alias);
+
+afs_warnuser("looping in NewVC for cur\n");
+
+#if 1
 			if (DCOUNT(dentry)) {
+afs_warnuser("about to TFDC\n");
 			    afs_TryFlushDcacheChildren(dentry);
 			}
+#endif
 
 			if (!DCOUNT(dentry)) {
+afs_warnuser("about to gunlock\n");
 			    AFS_GUNLOCK();
 			    DGET(dentry);
+afs_warnuser("about to dunlock\n");
 			    DUNLOCK();
+afs_warnuser("about to d_drop\n");
 			    d_drop(dentry);
+afs_warnuser("about to dput\n");
 			    dput(dentry);
+afs_warnuser("about to glock\n");
 			    AFS_GLOCK();
+afs_warnuser("about to restart\n");
 			    goto restart;
 			}
 			else {
 			    all = 0;
 			}
 		    }
+afs_warnuser("about to dunlock\n");
 		    DUNLOCK();
+afs_warnuser("past dunlock\n");
 		    if (all) vn --;
 		}
 	    }

--OXfL5xGRrasGEqWY--