[PRIVATE] Re: [OpenAFS] AFS outage, impact of "moving" root.cell.readonly,
Fri, 27 Apr 2007 05:53:22 -0600
My understanding is that the cached volume location info is expired
every two hours on the *NIX boxes (including Mac.)
Two hours after the removal and vos release of root.cell and root.afs
ROs I'd expect all of the *NIX clients to re-resolve root.afs/root.cell
Jeffrey Altman wrote:
> For root.afs, how many of your clients are using dynroot or freelance
> mode? All of the Windows clients are Freelance at JPL so that volume
> wouldn't have affected them.
> The root.cell volume would have been an issue but not if the clients
> had already evaluated the path to the volumes they were using
> and had drive letters assigned to.
> That leaves the Mac, Linux and Solaris clients. Again, the question is
> how many would have needed to resolve root.afs and root.cell during the
> two hour window?
> I don't know the answer but the audit logs would.
> Jeffrey Altman
> Kim Kimball wrote:
>> Oh right, I remember that bug.
>> I have, BTW, been enjoying the fruits of your AFS Windows endeavors.
>> The VLDB entries were correct with "vos listvl root.afs/root.cell"
>> during this confusion -- but may have been in an inconsistent state at
>> some point.
>> The only hypothesis I have right now involves clients having bad volume
>> location info, but why that wouldn't start for two hours escapes me.
>> The client refresh of the cached volume info is on a 2 hr interval.
>> Surely some clients would have refreshed prior to the two hour mark at
>> which the issues began.
>> Jeffrey Altman wrote:
>>> Kim Kimball wrote:
>>>> Don't know if Windows boxes were affected or not.
>>>> I know of at least one that was active during the entire window of
>>>> I'm analyzing the file server detailed logs (not FileLog, the -auditlog
>>>> output) now and should be able to answer the question with some level of
>>>> confidence soon.
>>> The reason I asked about the Windows clients is that there was a bug in
>>> the Windows clients that prevented read-only fail over from working. I
>>> believe it was fixed prior to 1.4.0. If your Windows clients were
>>> working and the UNIX clients were not, that could point to a bug in the
>>> UNIX clients.
>>> If however the Windows clients are also failing, then it points to
>>> something wrong in one of the databases.
>>> Jeffrey Altman
>> _______________________________________________ OpenAFS-info mailing
>> list OpenAFSfirstname.lastname@example.org