[OpenAFS] Fileserver loses contact with itself

Derrick J Brashear shadow@dementia.org
Tue, 18 Nov 2003 18:15:30 -0500 (EST)


On Tue, 18 Nov 2003, Tom Fitzgerald wrote:

> The problem persists for 5 minutes after the fileserver
> process is restarted, then goes away with no further action.
>
> This is with OpenAFS 1.2.9 on a heavily modified RedHat 9
> system, Linux 2.4.20 OS.

Is anyone running a fileserver without issue on RedHat 9?
I have this suspicion that either something about our pthread behavior, or
probably more likely, the NPTL stuff, is broken (the pthreaded
fileserver works fine on numerous other platforms, surely if we were using
pthreads incorrectly at least one would break this way)

You can do 3 things.

1) downgrade to the lwp (non-pthreads) fileserver. from a built openafs
tree, copy src/viced/fileserver out and run that.
2) find the environment variable you need to set to get old pthreads
support (I don't know offhand what it is, but I do know it exists)
3) assuming you have a cc -g (debug) build running, attach with gdb while
the server is hanging, and execute the following:
thread apply all where

save the output to a file, put it somewhere public, and send mail about
it.