[OpenAFS-devel] deadlock on linux-2.4

Pavel Semerad semerad@ss1000.ms.mff.cuni.cz
Wed, 5 Jun 2002 16:45:04 +0200


  I have found more info and created a temorary hack, which fixes it,
but it will not be probably the correct solution.

AFSD calls afs_StoreAllSegment, downgrades vcache->lock to SHARED. Then
     it obtains SHARED dcache->lock. And then it locks when trying to
     upgrade vcache->lock to WRITE (because cat has READ lock).
cat  calls afs_UFSRead and there obtains READ vcache->lock. Then
     afs_GetDCache is called and it locks when trying to obtail
     SHARED dcache->lock (because afsd has SHARED lock).

I don't know why other architectures are OK, may be it is not possible
to call afs_StoreAllSegment and afs_UFSRead together on them.
This patch (hack) solves it on linux-2.4:


--- src/afs/afs_segments.c Mar 2002 08:53:36 -0000	1.1.1.9
+++ src/afs/afs_segments.c Jun 2002 13:50:09 -0000
@@ -322,6 +322,9 @@ afs_StoreAllSegments(avc, areq, sync)
 	extern int afs_defaultAsynchrony;
 	XSTATS_DECLS
 
+#ifdef AFS_LINUX24_ENV
+	UpgradeSToWLock(&avc->lock,1000); // hack, deadlock vcache->lock and dcache->lock (afs_UFSRead gets READ vcache->lock and tryies SHARED dcache->lock, which is SHARED locked on following lines)
+#endif
 	for (bytes = 0, j = 0; !code && j<=high; j++) {
 	  if (dcList[j]) {
 	    ObtainSharedLock(&(dcList[j]->lock), 629);
@@ -621,6 +624,9 @@ restart:
 	    afs_PutDCache(tdc);
 	  }
 	}
+#ifdef AFS_LINUX24_ENV
+	ConvertWToSLock(&avc->lock); // hack, deadlock vcache->lock and dcache->lock
+#endif
       } /* if (j) */
 
     minj += NCHUNKSATONCE;



Pavel

> 
> Hi,
> there is a deadlock situation in openafs for linux-2.4 (at least in
> vanilla kernel 2.4.18 and CVS openafs version). linux-2.2, Solaris 8
> and Irix 6.5 seems to be OK.
> Alias mc from RedHat 7.3 sometimes caused it when exiting mc. And this
> command causes it:
> 
> N=0; while true; do N=$(($N+1)); echo $N; (cp /etc/profile aaa &); cat aaa >/dev/null; done
> 
> Here is a cmdebug output:
> ---
> ** Cache entry @ 0xc49a57d8 for 1.536877573.514.10221
> locks: (writer_waiting, upgrade_locked(pid:1334 at:121), 1 read_locks(pid:1395), 2 waiters)
> 1003 bytes  DV 4329 refcnt 2
> callback c36c9100   expires 1023138509
> 2 opens     1 writers
> normal file
> states (0x21), stat'd
> ---
> pid 1334 is afsd, 1395 is cat
> 
> afsd is holding shared lock on vcache->lock and is trying to obtain write lock
> in afs/afs_segments.c line 542 .
> cat called afs_UFSRead, there obtains read lock and called afs_GetDCache and
> then somewhere locks. I don't know where. Can it be linux-2.4 VM issue, or
> somebody has any ideas ?
> 
> Pavel Semerad
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel