[OpenAFS] AFS outage, impact of "moving" root.cell.readonly, root.afs.readonly

Kim Kimball dhk@ccre.com
Thu, 26 Apr 2007 16:29:25 -0600


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Oh right, I remember that bug.<br>
<br>
I have, BTW, been enjoying the fruits of your AFS Windows endeavors.<br>
<br>
The VLDB entries were correct with "vos listvl root.afs/root.cell"
during this confusion -- but may have been in an inconsistent state at
some point.<br>
<br>
The only hypothesis I have right now involves clients having bad volume
location info, but why that wouldn't start for two hours escapes me.&nbsp; <br>
<br>
The client refresh of the cached volume info is on a 2 hr interval.&nbsp;
Surely some clients would have refreshed prior to the two hour mark at
which the issues began.<br>
<br>
<br>
<br>
Jeffrey Altman wrote:
<blockquote cite="mid28821055.1177625211498.JavaMail.root@m11"
 type="cite">
  <pre wrap="">Kim Kimball wrote:
  </pre>
  <blockquote type="cite">
    <pre wrap="">Don't know if Windows boxes were affected or not.

I know of at least one that was active during the entire window of
confusion.

I'm analyzing the file server detailed logs (not FileLog, the -auditlog
output) now and should be able to answer the question with some level of
confidence soon.

Kim
    </pre>
  </blockquote>
  <pre wrap=""><!---->
The reason I asked about the Windows clients is that there was a bug in
the Windows clients that prevented read-only fail over from working.  I
believe it was fixed prior to 1.4.0.  If your Windows clients were
working and the UNIX clients were not, that could point to a bug in the
UNIX clients.

If however the Windows clients are also failing, then it points to
something wrong in one of the databases.

Jeffrey Altman
  </pre>
</blockquote>
</body>
</html>