[OpenAFS] FS exited on signal 6
Matthew Cocker
matt@cs.auckland.ac.nz
Mon, 11 Oct 2004 08:28:50 +1300
Hi
We have been having some issue with our linux afs servers (debian
stable, openafs 1.2.11, ext3 vice partitions.
The problem is they have started after 6 months of reasonably robust
services to give problems. What we see is the FS will exit (see boslog
below), and apparently salvage properly and restart. Unfortunately after
the restart (normally within about 24 hours) all access to volumes on
tthat server will stop. No errors in logs seem to be reported and vos
listvol may or may not work. Shutting the server down (sometimes hard
reset required), forcing a salvage (or two) and restarting seems to fix
the problem.
BosLog
Wed Oct 6 12:14:26 2004: fs:file exited on signal 6
Wed Oct 6 12:14:26 2004: fs:vol exited on signal 15
Wed Oct 6 12:35:43 2004: fs:salv exited with code 0
Sat Oct 9 18:02:46 2004: fs:file exited on signal 6
Sat Oct 9 18:02:46 2004: fs:vol exited on signal 15
Sat Oct 9 18:24:07 2004: fs:salv exited with code 0
Questions:
Should we schedule say a 3 monthly salvage on all volumes in the hope of
avoiding unscheduuled outages?
Do other OS/filesystems have the same issue (we are about to expand afs
univerity wide and have the chance now to chance OS/filesystem if this
helps)?
Any idea what is causing the above lock up (and the exited on signal 6
for that matter)?
Cheers
Matt