[OpenAFS] FS exited on signal 6

Matthew Cocker matt@cs.auckland.ac.nz
Mon, 11 Oct 2004 08:28:50 +1300


Hi

We have been having some issue with our linux afs servers (debian 
stable, openafs 1.2.11, ext3  vice partitions.

The problem is they have started after 6 months of reasonably robust 
services to give problems. What we see is the FS will exit (see boslog 
below), and apparently salvage properly and restart. Unfortunately after 
the restart (normally within about 24 hours) all access to volumes on 
tthat server will stop. No errors in logs seem to be reported and vos 
listvol may or may not work. Shutting the server down (sometimes hard 
reset required), forcing a salvage (or two) and restarting seems to fix 
the problem.

BosLog

Wed Oct  6 12:14:26 2004: fs:file exited on signal 6
Wed Oct  6 12:14:26 2004: fs:vol exited on signal 15
Wed Oct  6 12:35:43 2004: fs:salv exited with code 0
Sat Oct  9 18:02:46 2004: fs:file exited on signal 6
Sat Oct  9 18:02:46 2004: fs:vol exited on signal 15
Sat Oct  9 18:24:07 2004: fs:salv exited with code 0


Questions:

Should we schedule say a 3 monthly salvage on all volumes in the hope of 
avoiding unscheduuled outages?

Do other OS/filesystems have the same issue (we are about to expand afs 
univerity wide and have the chance now to chance OS/filesystem if this 
helps)?

Any idea what is causing the above lock up (and the exited on signal 6 
for that matter)?


Cheers

Matt