[OpenAFS] fileserver goes down overnight

david l goodrich dlg@dsrw.org
Tue, 24 Mar 2009 13:13:08 -0500


--cfJ13FhsvNR/yOpm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 24, 2009 at 07:02:29PM +0100, Anders Magnusson wrote:
> david l goodrich wrote:
> > On Tue, Mar 24, 2009 at 10:39:24AM -0700, Russ Allbery wrote:
> >  =20
> >> david l goodrich <dlg@dsrw.org> writes:
> >>
> >>    =20
> >>> The past two nights, I've had one of my AFS fileserver go "down"
> >>>
> >>> I say "down" and not down because it's not totally nonfunctional.
> >>>
> >>> It thinks it's running fine:
> >>>
> >>> sprawl# bos status localhost -localauth
> >>> Instance fs, currently running normally.
> >>>     Auxiliary status is: file server running.
> >>>      =20
> >> bos status -long is generally more useful.  However:
> >>    =20
> > Can do:
> > sprawl# bos status localhost -localauth -long
> > Instance fs, (type is fs) currently running normally.
> >     Auxiliary status is: file server running.
> >     Process last started at Mon Mar 23 17:33:57 2009 (3 proc
> > starts)
> >     Last exit at Mon Mar 23 17:33:57 2009
> >     Command 1 is '/usr/pkg/libexec/openafs/fileserver'
> >     Command 2 is '/usr/pkg/libexec/openafs/volserver'
> >     Command 3 is '/usr/pkg/libexec/openafs/salvager'
> >
> > sprawl# ps auxw | grep /openafs/
> > root   376  0.0  0.0 2316     4 ?       DW    5:33PM 0:00.83 /usr/pkg/l=
ibexec/openafs/volserver
> > root   727  0.0  0.0 8664  2384 ?       IW<a  5:33PM 0:18.29 /usr/pkg/l=
ibexec/openafs/fileserver
> >  =20
> If the D flag in the ps line means the same on your system as on mine
> you might
> have problem.  D usually stands for a process waiting for I/O, and if it
> don't leave
> that state it means that it never completes.  The W flag normally stands
> fro swapped out, and
> you also seems to have nothing of the process resident.

`man ps` would seem to confirm this:
         D      Marks a process in disk (or other short term, uninter-
                ruptible) wait.

>=20
> You don't have any hardware complaints in messages?

Well, it's a xen domU, so there's nothing on the server's dmesg.
I don't see anything on the dom0 to suggest hardware problems,
either.
  --david


>=20
> -- Ragge
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

--cfJ13FhsvNR/yOpm
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAknJIrQACgkQHDmo5jqnP4T28ACfT7+OFPYODgPUJ/jfg2nW1L55
HJkAnRVWZmFK9kp6KEq+M8H7uevTu8Iv
=Yqby
-----END PGP SIGNATURE-----

--cfJ13FhsvNR/yOpm--