[OpenAFS] file server occasionally stops serving
Tue, 22 Jun 2004 15:12:44 -0400
Content-Type: text/plain; charset=us-ascii
We're having a problem on one of our AFS file servers where the
fileserver process will occasionally become completely unresponsive.
bos status indicates that the fileserver process is running normally,
but fs checkservers indicates that the server is unavailable and any
attempts to access volumes that it serves fail.
Normally when this happens (though there has been at least one case
where this has not been true) there is one fileserver process spinning,
using 100% of the available CPU cycles. bos shutdown will cleanly shut
down the non-fileserver processes, but won't shutdown fileserver. bos
status following a bos shutdown indicates that the fileserver process is
shutting down, but it never actually finishes. Each time this has
happened, I've had to log in to the server and send the fileserver
process a KILL signal. Obviously this is not a good thing, nor is it
good that the server keeps getting itself into this state to begin with.
Since this only ever seems to happen on one of the AFS servers, I
suspect that the problem is somehow specific to the data on the machine
that fails, since they're all configured identically. Might there be a
corrupt volume? Could it be that the underlying OS filesystem on one of
the partitions is corrupt? The filesystem is ext3 and the OS is Debian
on a Linux 2.4.26 kernel using the openafs-fileserver 1.2.11-0.woody1
packages available on the openafs.org web site.
If anybody can suggest a course of action at this point, or a means of
gathering more information about the cause of the problem, I'd very much
Noah Meyerhans System Administrator
MIT Computer Science and Artificial Intelligence Laboratory
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
-----END PGP SIGNATURE-----