[OpenAFS] fileserver goes down overnight

david l goodrich dlg@dsrw.org
Tue, 24 Mar 2009 12:20:06 -0500


--azLHFNyN32YCQGCU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

The past two nights, I've had one of my AFS fileserver go "down"

I say "down" and not down because it's not totally nonfunctional.

It thinks it's running fine:

sprawl# bos status localhost -localauth
Instance fs, currently running normally.
    Auxiliary status is: file server running.
sprawl# bos version
openafs 1.4.6
sprawl#

but none of the clients (running 1.4.8 and 1.4.6) are able to
connect to the volumes on the server, despite believing that=20
dlg@chaos:~$ fs checkservers -fast -all
All servers are running.
dlg@chaos:~$ vos listvol sprawl
Could not fetch the list of partitions from the server
Possible communication failure
Error in vos listvol command.
Possible communication failure
dlg@chaos:~$

I've turned up logging on sprawl's fileserver, but I'm not really
sure what I should be looking for.  Any help would be
appreciated.
  --david




--azLHFNyN32YCQGCU
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAknJFkYACgkQHDmo5jqnP4TdiwCghIpca/VPx8Zkz5+6pbE963oe
NpcAni7DXaEpHb8rhIB4LaVe+gtDeIk6
=aEqa
-----END PGP SIGNATURE-----

--azLHFNyN32YCQGCU--