[OpenAFS] Problems on AFS Unix clients after AFS fileserver moves

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 01 Sep 2005 21:07:38 -0400


On Tuesday, August 09, 2005 10:58:22 AM -0500 Rich Sudlow <rich@nd.edu> 
wrote:

> We've been having problems with our cell for the last couple
> years with AFS clients after fileservers are taken out of service.
> Before that things seemed to work ok when doing fileserver moves and
> rebuilding. All data was moved off the fileserver but the clients
> still seem to have some need to talk to it.  In the past the AFS
> admins have left the fileservers up and empty for a number of
> days to try to resolve this issue -  but it doesn't resolve the
> issue.

That's because there is no "issue" here.  What you've just described is the 
result of the cache manager's normal checkservers loop, in which it pings 
_every server it has ever had to talk to_ every 5 minutes or so, to see if 
it is still up (or down, as the case may be).  This is also why 'fs 
checkservers' is reporting the server down -- it reports on every server 
that client has contacted since startup.


This behavior is normal and is unrelated to the problem you were actually 
seeing, which was apparently about an unexpectedly missing rep site.  The 
'fs checkv' that Kim Kimball suggested was presumably effective because 
your cache manager picked a different site next time around.

I'd get that release problem fixed, and see if that doesn't make most of 
your troubles go away.  Under normal conditions, it should be sufficient to 
leave an emptied fileserver up for two hours after the last volume is moved 
off.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA