[OpenAFS-devel] Problem with OpenAFS fileserver

Onime Clement onime@ictp.trieste.it
Sat, 23 Oct 2004 14:40:23 +0200 (CEST)

I am having some problems with the AFS fileserver version 1.2.11 on Linux
(RedHat 7.3).
I have about 5 Fileservers each with about 100GB on hardware raid
dedicated only to AFS fileservice.
This morning some clients (random) started expirienceing lockups and
attempts to reboot failed at stopping AFS (client hung) and after a hard
reset, the client stopped at Starting afsd process. After logging in as
root, several afsd process are shown as defunct.

 Using bos status against all DB and Fileservers says that servers are up
and running.
I tried rebooting all DB servers, to no avail, eventually I started
rebooting the fileservers one after the other and on one of them The
reboot process stopped at shutting down fileserver:
Log extract from /usr/afs/logs/FileLog: (I can send the full version on
Sat Oct 23 10:44:54 2004 With 90 directory buffers; 6772169 reads resulted
in 49569 read I/Os
Sat Oct 23 10:44:54 2004 Total Client entries = 363, blocks = 168; Host
entries = 260, blocks = 1
Sat Oct 23 10:44:54 2004 There are 363 connections, process size 140156
Sat Oct 23 10:44:54 2004 There are 256 workstations, 0 are active (req in
< 15 mins), 8 marked "down"
Sat Oct 23 10:44:54 2004 VShutdown:  shutting down on-line volumes...

But the server is stuck on Stopping AFS fileserver
After a hard reset, the clients started working fine. (On those clients,
fs getserverprefs showed that the failed server was high on the list.

I am perplexed as to why bos should say the fileserver is working normally
when it seemed to be blocking the clients
Are there any other checks to see if the fileserver is working fine?
Also does anyone have any idea as why this (DOS) is happening and how to
prevent it happening again ?

Clement Onime