[OpenAFS-devel] deadlock in host.c (fileserver)

Rainer Toebbicke rtb@pclella.cern.ch
Tue, 20 Jun 2006 17:06:13 +0200


This is a multi-part message in MIME format.
--------------090902000102060704030305
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Jeffrey Altman wrote:
> Rainer:
> 
> Thank you for the patch.  The second block of your patch is indeed a
> deadlock that must be fixed.  The first block of your patch affects
> code that has been changed on the 1.4.x branch.  Please examine the
> attached patch.
> 

No, I'm afraid that's not sufficient: after dropping and re-acquiring 
the locks it may well be that the rxcon_client has changed again.

Therefore you need the

while (... && ... && ...) / continue instead of the if (...) construct.

I came up with the patch from the 1.4.1 base and wasn't aware
of the rxr_CidOf() changes.

Attached is the patch against 1.57.2.36, but re-applied through cut & 
paste and neither tested nor even compiled.


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155

--------------090902000102060704030305
Content-Type: text/plain;
 name="patch_client_deadlock"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="patch_client_deadlock"

--- /tmp/host.c.o141	2006-06-20 16:55:02.000000000 +0200
+++ /tmp/host.c	2006-06-20 17:00:30.000000000 +0200
@@ -1758,9 +1758,12 @@
      * required).  So, before setting the RPC's rock, we should disconnect
      * the RPC from the other client structure's rock.
      */
-    oldClient = (struct client *)rx_GetSpecific(tcon, rxcon_client_key);
-    if (oldClient && oldClient->sid == rxr_CidOf(tcon)
-	&& oldClient->VenusEpoch == rxr_GetEpoch(tcon)) {
+    while (
+    	(oldClient = (struct client *)rx_GetSpecific(tcon, rxcon_client_key))
+	&& oldClient != client
+	&& oldClient->sid == rxr_CidOf(tcon)
+	&& oldClient->VenusEpoch == rxr_GetEpoch(tcon)
+    ) {
 	char hoststr[16];
 	if (!oldClient->deleted) {
 	    /* if we didn't create it, it's not ours to put back */
@@ -1782,9 +1785,12 @@
 		FreeCE(client);
 		created = 0;
 	    } 
-	    ObtainWriteLock(&oldClient->lock);
 	    oldClient->refCount++;
+	    H_UNLOCK;
+	    ObtainWriteLock(&oldClient->lock);
+	    H_LOCK;
 	    client = oldClient;
+	    continue;                           /* all locks dropped, re-check */
 	} else {
 	    ViceLog(0, ("FindClient: deleted client %x(%x) already had conn %x (host %s:%d), stolen by client %x(%x)\n", 
 			oldClient, oldClient->sid, tcon, 

--------------090902000102060704030305--