[OpenAFS] Errors: Fileserver freezes, Volumes contains orphans

Derrick J Brashear shadow@dementia.org
Sat, 8 Feb 2003 05:25:33 -0500 (EST)


On Thu, 30 Jan 2003, Derrick J Brashear wrote:

> On Thu, 30 Jan 2003, [iso-8859-1] Rubino Geiß wrote:
> 
> > This morning one of our fileservers (OpenAFS 1.2.8, rh8.0) stopped serving
> > files. Doing bos status, ping, rxdebug and looking at the log files at most
> > everything seemed to be ok. Only the BosLog showed constantly restarting
> > file / salv processes.
> 
> I'll guess that this also was "main thread of fileserver died"
> 
> > Can anybody tell us how to get rid of these nasty features ;)
> 
> Not unless you can actually get us a core or something else to work from.

Ok, so the goal is to get a core, and the problem is the pthread
fileserver on linux, like any other pthreaded process, doesn't drop a core
and take the whole process when it dies.

I'll suggest that you can get us a core by building and running a kernel
with a patch. The instructions are here:
http://www-124.ibm.com/linux/projects/mtcoredumps/

However more current version of the patch is in this message:
http://www.cs.helsinki.fi/linux/linux-kernel/2002-50/0389.html

you'll need to save as text and run munpack to get the patch out.

Also, there's an improved version in 2.5 kernels which RedHat has
backported in their Rawhide kernels that didn't work as of about the same
time:
http://www.cs.helsinki.fi/linux/linux-kernel/2002-50/0473.html

but if you use a Rawhide kernel, make your life easier and just make
sys_call_table be exported before you build the kernel.