[OpenAFS-devel] deadlock in host.c (fileserver)
Rainer Toebbicke
rtb@pclella.cern.ch
Tue, 20 Jun 2006 12:01:52 +0200
This is a multi-part message in MIME format.
--------------040703080706040605060006
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
There is a possible deadlock in host.c in the fileserver code:
around line 1882, where the lock for client is dropped in order to
lock oldClient, it is not impossible (e.g. through prfail == 2, or
just if nothing exciting happened through pr_GetCPS) that client ==
oldClient.
In itself "switching" at that point is already fishy.
Worse however, the H_LOCK is held. When the lock for client is
dropped, another thread can grab it and immediately after that queue
for the H_LOCK. Since we hold H_LOCK we'll deadlock if we now wait for
the oldClient's lock since the other thread's got it.
The attached patch ...
1. correctly handles client == oldClient
2. drops the H_LOCK when switching to oldClient, with the reference
count updated before,
3. does all this in a re-check loop since both locks are dropped at
one point.
(Bcc'ed to openafs-bugs)
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985 Fax: +41 22 767 7155
--------------040703080706040605060006
Content-Type: text/plain;
name="patch_client_deadlock"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="patch_client_deadlock"
--- openafs/src/viced/host.c.o141 2006-04-26 14:55:10.000000000 +0200
+++ openafs/src/viced/host.c 2006-06-19 10:42:20.000000000 +0200
@@ -1853,8 +1853,11 @@
* required). So, before setting the RPC's rock, we should disconnect
* the RPC from the other client structure's rock.
*/
- oldClient = (struct client *)rx_GetSpecific(tcon, rxcon_client_key);
- if (oldClient && oldClient->tcon == tcon) {
+ while (
+ (oldClient = (struct client *)rx_GetSpecific(tcon, rxcon_client_key))
+ && oldClient != client
+ && oldClient->tcon == tcon
+ ) {
char hoststr[16];
if (!oldClient->deleted) {
/* if we didn't create it, it's not ours to put back */
@@ -1879,9 +1882,12 @@
FreeCE(client);
created = 0;
}
- ObtainWriteLock(&oldClient->lock);
oldClient->refCount++;
+ H_UNLOCK;
+ ObtainWriteLock(&oldClient->lock);
+ H_LOCK;
client = oldClient;
+ continue; /* all locks dropped, re-check */
} else {
rx_PutConnection(oldClient->tcon);
oldClient->tcon = (struct rx_connection *)0;
--------------040703080706040605060006--