[OpenAFS] Problems on AFS Unix clients after AFS fileserver
moves
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 01 Sep 2005 21:07:38 -0400
On Tuesday, August 09, 2005 10:58:22 AM -0500 Rich Sudlow <rich@nd.edu>
wrote:
> We've been having problems with our cell for the last couple
> years with AFS clients after fileservers are taken out of service.
> Before that things seemed to work ok when doing fileserver moves and
> rebuilding. All data was moved off the fileserver but the clients
> still seem to have some need to talk to it. In the past the AFS
> admins have left the fileservers up and empty for a number of
> days to try to resolve this issue - but it doesn't resolve the
> issue.
That's because there is no "issue" here. What you've just described is the
result of the cache manager's normal checkservers loop, in which it pings
_every server it has ever had to talk to_ every 5 minutes or so, to see if
it is still up (or down, as the case may be). This is also why 'fs
checkservers' is reporting the server down -- it reports on every server
that client has contacted since startup.
This behavior is normal and is unrelated to the problem you were actually
seeing, which was apparently about an unexpectedly missing rep site. The
'fs checkv' that Kim Kimball suggested was presumably effective because
your cache manager picked a different site next time around.
I'd get that release problem fixed, and see if that doesn't make most of
your troubles go away. Under normal conditions, it should be sufficient to
leave an emptied fileserver up for two hours after the last volume is moved
off.
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA