[OpenAFS-devel] volserver hangs possible fix
Horst Birthelmer
horst@riback.net
Thu, 7 Apr 2005 10:17:38 +0200
On Apr 7, 2005, at 9:27 AM, Marcus Watts wrote:
> Tom Keiser <tkeiser@gmail.com> writes:
> ...
>>
...
>
> This is from linux, and the core dump I had was from "gcore" in gdb.
> I didn't think to get one from the fileserver as well, but I wish I had
> now.
>
Well, me, too. :-)
I had that kind of behavior a few times on AIX servers and monitored
the traffic. There were no callbacks the system was waiting for. I
didn't have a core dump either. back then I assumed it must have been
some weirdness in the filehandle handling, since that's what the
processing of FSYNC_askfs implies besides your point. The complete
FSYNC communication stopped for no "good" reason. I never got to the
bottom of it.
...
There are a few places in the code how one can stop the execution of
some FSYNC_xxx calls but nothing I considered suspicious enough to
further investigate.
> On a somewhat separate note: this logic is using an inet domain socket
> which is in fact reachable via the network (at least by other hosts
> on the same network segment). It should probably be using a unix
> domain
> socket instead, and that could also simplify some of the other logic.
>
This in deed was pointed out some time ago on this list. But the issue
never came up again, so the whole thing maybe got forgotten.
Horst