[OpenAFS] AFS outage, impact of "moving" root.cell.readonly, root.afs.readonly

Kim Kimball dhk@ccre.com
Thu, 26 Apr 2007 14:51:41 -0600


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Yeah, I realized I left that out.<br>
<br>
File servers are running 1.4.1b<br>
<br>
Clients: 1.4.x on MacOS, some Transarc 3.6 on Solaris 9, RHEL4 running
1.4.4.<br>
<br>
Apparently not specific to a client version or platform.&nbsp; <br>
<br>
Kim<br>
<br>
<br>
Jeffrey Altman wrote:
<blockquote cite="mid21153656.1177615260108.JavaMail.root@m11"
 type="cite">
  <pre wrap="">Kim Kimball wrote:
  </pre>
  <blockquote type="cite">
    <pre wrap="">Yesterday I removed one of multiple instances of root.cell.readonly
(from file server X) and one of multiple instances of root.afs.readonly
(from file server X also.)

Almost exactly two hours later a number of AFS clients could not access
/afs and/or /afs/&lt;local cell&gt;, and the number of affected clients
increased over the next thirty minutes or so.

My expectation was that the clients would adjust to the removal of one
instance of the ROs.  They apparently did not.

I mounted &lt;local cell&gt; from a client in a different cell -- one that
most likely did not have any volume location information from &lt;local
cell&gt; and confirmed that all AFS volumes were on line and available -- I
was able to walk the tree mounted to &lt;localcell&gt;:root.afs

To clear up the confusion on the client side I restarted (I sure like
fast restart) all file servers and we returned to normal within five
minutes.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Which client operation system and version?


  </pre>
</blockquote>
</body>
</html>