[OpenAFS-devel] Re: [OpenAFS] AFS 1.2.8 fileserver Failing in GetClient()

Derrick J Brashear shadow@dementia.org
Fri, 4 Apr 2003 03:08:09 -0500 (EST)


On Thu, 3 Apr 2003, Douglas E. Engert wrote:

> The file servers have stayed up since we put on the 1.2.9-rc4 fileserver,
> and the two machines which are triggering the VBUSYING messages are running,
> apparently without any problems. They are all production machines, so I
> don't want to touch them if possible.
>
> We are still trying to figure out what is triggering it, so we can try and
> reproduce it on our third AFS server that is not as critical. If we can, then
> we can try you patch on that server. We think the multi-homed client
> has something to do with it. Upgrading the cache manager to 1.2.8 did not help.

I think I have the answer to this, but I'm hoping someone will comment
if I share:

2 calls from the same conn come in... one creates a new
client, gets to pr_GetCPS in h_FindClient_r after H_UNLOCK, and the
second call then proceeds through h_FindClient_r, coming past after the
getcps call won, and so avoiding needing to block there, and sets a tcon
for the client... then the other call unblocks, comes through, frees the
old tcon client, and sets a new one (which happens to be the same client).
call to SetSpecific frees the previous one... which happens to be the same
one.

perhaps the right answer is to see if oldClient is the same client as
client (at the end of h_FindClient_r) and if so, not reset it.

maybe something like this:
--- src/viced/host.c    2 Apr 2003 00:23:58 -0000       1.7.2.16
+++ src/viced/host.c    4 Apr 2003 08:07:16 -0000
@@ -1464,11 +1464,19 @@
      * the RPC from the other client structure's rock.
      */
     if (oldClient = (struct client *) rx_GetSpecific(tcon, rxcon_client_key)) {
-       oldClient->tcon = (struct rx_connection *) 0;
-       /* rx_SetSpecific will be done immediately below */
+       if (oldClient != client) {
+           /* rx_SetSpecific will be done immediately below */
+           oldClient->tcon = (struct rx_connection *) 0;
+           client->tcon = tcon;
+           rx_SetSpecific(tcon, rxcon_client_key, client);
+       }
+       /* the implicit else here is to not reset the same client we
+          already had. */
+    } else {
+       client->tcon = tcon;
+       rx_SetSpecific(tcon, rxcon_client_key, client);
     }
-    client->tcon = tcon;
-    rx_SetSpecific(tcon, rxcon_client_key, client);
+
     ReleaseWriteLock(&client->lock);

     return client;