[OpenAFS] Help needed for receovery of data of inode fileserver (Solaris 10 x86)

Hartmut Reuter reuter@rzg.mpg.de
Fri, 04 Apr 2008 15:47:33 +0200


Jeffrey Altman wrote:
> Hartmut Reuter wrote:
> 
>> Jeffrey Altman wrote:
>>
>>> Hartmut Reuter wrote:
>>>
>>>>> So what is the value of 'class' if not vLarge?
>>>>>
>>>> As you can see from that line above it's vSmall:
>>>>
>>>>  >>   [6] DistilVnodeEssence(rwVId = 536870912U, class = 1, ino =
>>>>  >> 21977313U, maxu = 0x8046bc4), line 3175 in "vol-salvage.c"
>>>>
>>>> So there might be really some thing wrong with the SmallVnodeFile, 
>>>> but to do an AssertionFailed is not the best way to repair it!
>>>
>>>
>>>
>>> What the AssertionFailed means is that no one has written code to
>>> deal with a case where this error has occurred.   It can't be
>>> fixed with Salvager until someone writes the missing code.
>>
>>
>> Of course, but for the user it might be better to skip handling of 
>> this error and to continue with the next vnode. So he could get back 
>> at least the damaged volume and copy whatever is still accessible.
>>
>> So John, ifdef line 3175 and recompile. If this was a single bad vnode 
>> your volume may come online again, otherwise it's probably lost anyway.
>>
>> Hartmut
> 
> 
> I disagree.   The reason that assert is there is that continuing
> will cause more damage to the data.  We do not know based upon
> the available data whether this is a single bad vnode or whether
> perhaps the wrong file is being reference for the SmallVnodeFile.
> 
> What is known is that one vnode, perhaps the first vnode examined
> has completely valid data except for the fact that it is in the
> wrong file.
> 
> There are several issues that are worth pursuing here.  Especially 
> because whatever the problem is has begun occurring on multiple machines:
> 
> 1. what is the actual damage that has taken place?
> 
> 2. can the damage be correct?
> 
> 3. can the damage be avoided in the first place?  What is the cause?
> 
> Jeffrey Altman

Of course we should not remove the assert() forever, but just for the
test of this volume which otherwise probably will be lost anyway.

In MR-AFS we had a -nowrite option to do just a dry-run. I admit that
it's a lot work to implement this, but some times it is very helpful.

I just saw -nowrite exists also in OpenAFS only that the bos command 
claims it would be possible only in MR-AFS. So one could at least run 
the salvager under the debugger with -nowrite

Hartmut


> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info


-- 
-----------------------------------------------------------------
Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
			   	phone 		 +49-89-3299-1328
			   	fax   		 +49-89-3299-1301
RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------