[OpenAFS] AFS outage, impact of "moving" root.cell.readonly, root.afs.readonly

Kim Kimball dhk@ccre.com
Thu, 26 Apr 2007 11:30:31 -0600


Yesterday I removed one of multiple instances of root.cell.readonly 
(from file server X) and one of multiple instances of root.afs.readonly 
(from file server X also.)

Almost exactly two hours later a number of AFS clients could not access 
/afs and/or /afs/<local cell>, and the number of affected clients 
increased over the next thirty minutes or so.

My expectation was that the clients would adjust to the removal of one 
instance of the ROs.  They apparently did not.

I mounted <local cell> from a client in a different cell -- one that 
most likely did not have any volume location information from <local 
cell> and confirmed that all AFS volumes were on line and available -- I 
was able to walk the tree mounted to <localcell>:root.afs

To clear up the confusion on the client side I restarted (I sure like 
fast restart) all file servers and we returned to normal within five 
minutes.

Where did I go wrong?

Thanks!

Kim