[OpenAFS] fileserver goes down overnight

Russ Allbery rra@stanford.edu
Tue, 24 Mar 2009 10:39:24 -0700

david l goodrich <dlg@dsrw.org> writes:

> The past two nights, I've had one of my AFS fileserver go "down"
> I say "down" and not down because it's not totally nonfunctional.
> It thinks it's running fine:
> sprawl# bos status localhost -localauth
> Instance fs, currently running normally.
>     Auxiliary status is: file server running.

bos status -long is generally more useful.  However:

> but none of the clients (running 1.4.8 and 1.4.6) are able to
> connect to the volumes on the server, despite believing that 
> dlg@chaos:~$ fs checkservers -fast -all
> All servers are running.
> dlg@chaos:~$ vos listvol sprawl
> Could not fetch the list of partitions from the server
> Possible communication failure
> Error in vos listvol command.
> Possible communication failure

I suspect your volserver either died or went unresponsive.  What version
of OpenAFS is the fileserver?  Is there anything incriminating in
VolserLog or FileLog?

