[OpenAFS-devel] volserver hangs possible fix

Horst Birthelmer horst@riback.net
Thu, 7 Apr 2005 10:17:38 +0200


On Apr 7, 2005, at 9:27 AM, Marcus Watts wrote:

> Tom Keiser <tkeiser@gmail.com> writes:
> ...
>>


...
>
> This is from linux, and the core dump I had was from "gcore" in gdb.
> I didn't think to get one from the fileserver as well, but I wish I had
> now.
>

Well, me, too. :-)
I had that kind of behavior a few times on AIX servers and monitored 
the traffic. There were no callbacks the system was waiting for. I 
didn't have a core dump either. back then I assumed it must have been 
some weirdness in the filehandle handling, since that's what the 
processing of FSYNC_askfs implies besides your point. The complete 
FSYNC communication stopped for no "good" reason. I never got to the 
bottom of it.

...

There are a few places in the code how one can stop the execution of 
some FSYNC_xxx calls but nothing I considered suspicious enough to 
further investigate.

> On a somewhat separate note: this logic is using an inet domain socket
> which is in fact reachable via the network (at least by other hosts
> on the same network segment).  It should probably be using a unix 
> domain
> socket instead, and that could also simplify some of the other logic.
>

This in deed was pointed out some time ago on this list. But the issue 
never came up again, so the whole thing maybe got forgotten.


Horst