[PRIVATE] Re: [OpenAFS] AFS outage, impact of "moving" root.cell.readonly, root.afs.readonly

Kim Kimball dhk@ccre.com
Fri, 27 Apr 2007 05:53:22 -0600


My understanding is that the cached volume location info is expired 
every two hours on the *NIX boxes (including Mac.)

Two hours after the removal and vos release of root.cell and root.afs 
ROs I'd expect all of the *NIX clients to re-resolve root.afs/root.cell 
RO locations.

Jeffrey Altman wrote:
> For root.afs, how many of your clients are using dynroot or freelance
> mode?   All of the Windows clients are Freelance at JPL so that volume
> wouldn't have affected them.
>
> The root.cell volume would have been an issue but not if the clients
> had already evaluated the path to the volumes they were using
> and had drive letters assigned to.
>
> That leaves the Mac, Linux and Solaris clients.  Again, the question is
> how many would have needed to resolve root.afs and root.cell during the
> two hour window?
>
> I don't know the answer but the audit logs would.
>
> Jeffrey Altman
>
>
> Kim Kimball wrote:
>   
>> Oh right, I remember that bug.
>>
>> I have, BTW, been enjoying the fruits of your AFS Windows endeavors.
>>
>> The VLDB entries were correct with "vos listvl root.afs/root.cell"
>> during this confusion -- but may have been in an inconsistent state at
>> some point.
>>
>> The only hypothesis I have right now involves clients having bad volume
>> location info, but why that wouldn't start for two hours escapes me. 
>>
>> The client refresh of the cached volume info is on a 2 hr interval. 
>> Surely some clients would have refreshed prior to the two hour mark at
>> which the issues began.
>>
>>
>>
>> Jeffrey Altman wrote:
>>     
>>> Kim Kimball wrote:
>>>   
>>>       
>>>> Don't know if Windows boxes were affected or not.
>>>>
>>>> I know of at least one that was active during the entire window of
>>>> confusion.
>>>>
>>>> I'm analyzing the file server detailed logs (not FileLog, the -auditlog
>>>> output) now and should be able to answer the question with some level of
>>>> confidence soon.
>>>>
>>>> Kim
>>>>     
>>>>         
>>> The reason I asked about the Windows clients is that there was a bug in
>>> the Windows clients that prevented read-only fail over from working.  I
>>> believe it was fixed prior to 1.4.0.  If your Windows clients were
>>> working and the UNIX clients were not, that could point to a bug in the
>>> UNIX clients.
>>>
>>> If however the Windows clients are also failing, then it points to
>>> something wrong in one of the databases.
>>>
>>> Jeffrey Altman
>>>   
>>>       
>> _______________________________________________ OpenAFS-info mailing
>> list OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>