[OpenAFS-devel] deadlock in host.c (fileserver)

Rainer Toebbicke rtb@pclella.cern.ch
Tue, 20 Jun 2006 12:01:52 +0200


This is a multi-part message in MIME format.
--------------040703080706040605060006
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

There is a possible deadlock in host.c in the fileserver code:

around line 1882, where the lock for client is dropped in order to 
lock oldClient, it is not impossible (e.g. through prfail == 2, or 
just if nothing exciting happened through pr_GetCPS) that client == 
oldClient.

In itself "switching" at that point is already fishy.

Worse however, the H_LOCK is held. When the lock for client is 
dropped, another thread can grab it and immediately after that queue 
for the H_LOCK. Since we hold H_LOCK we'll deadlock if we now wait for 
the oldClient's lock since the other thread's got it.

The attached patch  ...

1. correctly handles client == oldClient

2. drops the H_LOCK when switching to oldClient, with the reference
count updated before,

3. does all this in a re-check loop since both locks are dropped at 
one point.

(Bcc'ed to openafs-bugs)

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155

--------------040703080706040605060006
Content-Type: text/plain;
 name="patch_client_deadlock"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="patch_client_deadlock"

--- openafs/src/viced/host.c.o141	2006-04-26 14:55:10.000000000 +0200
+++ openafs/src/viced/host.c	2006-06-19 10:42:20.000000000 +0200
@@ -1853,8 +1853,11 @@
      * required).  So, before setting the RPC's rock, we should disconnect
      * the RPC from the other client structure's rock.
      */
-    oldClient = (struct client *)rx_GetSpecific(tcon, rxcon_client_key);
-    if (oldClient && oldClient->tcon == tcon) {
+    while (
+    	(oldClient = (struct client *)rx_GetSpecific(tcon, rxcon_client_key))
+        && oldClient != client
+	&& oldClient->tcon == tcon
+    ) {
 	char hoststr[16];
 	if (!oldClient->deleted) {
 	    /* if we didn't create it, it's not ours to put back */
@@ -1879,9 +1882,12 @@
 		FreeCE(client);
 		created = 0;
 	    } 
-	    ObtainWriteLock(&oldClient->lock);
 	    oldClient->refCount++;
+	    H_UNLOCK;
+	    ObtainWriteLock(&oldClient->lock);
+	    H_LOCK;
 	    client = oldClient;
+	    continue;				/* all locks dropped, re-check */
 	} else {
 	    rx_PutConnection(oldClient->tcon);
 	    oldClient->tcon = (struct rx_connection *)0;

--------------040703080706040605060006--