[OpenAFS] Re: Crazy DAFS problem (with log)

Andrew Deason adeason@sinenomine.net
Sun, 20 Mar 2011 21:57:22 -0500


On Sun, 20 Mar 2011 20:59:43 -0500
"Ryan C. Underwood" <nemesis-lists@icequake.net> wrote:

> Not much interesting in BosLog.old:
> Sun Mar 20 04:00:29 2011: Core limits now -1 -1

Oh, hmm, this starts too late. I guess you don't have the BosLog before
this?

> 03/20/2011 16:34:31 CHECKING CLONED VOLUME 536871274.
> 03/20/2011 16:34:31 www_logs.readonly (536871274) updated 03/20/2011 05:45
> 03/20/2011 16:34:31 Vnode 22: length incorrect; (is 1773498 should be 0)
> 03/20/2011 16:34:31 SALVAGING VOLUME 536871273.

The incorrect vnode length should result in the RO getting deleted. I'm
not sure why it's not; I'll look tomorrow. I expect that's not a dafs
problem, though; just a general salvager issue.

A vnode that keeps having the wrong length might be one explanation for
the repeated salvages, but it still seems like the headers are screwed
up, which is (probably) unrelated.

> 03/20/2011 16:36:17 The volume header file /vicepa/V0536871274.vol is not associated with any actual data (deleted)
> 03/20/2011 16:40:14 dispatching child to salvage volume 536871273...
> 03/20/2011 16:40:14 2 nVolumesInInodeFile 56 
> 03/20/2011 16:40:14 CHECKING CLONED VOLUME 536871274.
> 03/20/2011 16:40:14 www_logs.readonly (536871274) updated 03/20/2011 16:39
> 03/20/2011 16:40:14 Vnode 22: length incorrect; (is 1794950 should be 0)

Do you know if this volume was recreated between these times? I'm not
sure if the salvager is just lying, but this would indicate it deleted
the RO, and then it came back again and again has invalid metadata.

> [.. ad infinitum ]

Actually, it may help to continue for an interation or two more. I don't
think these salvages have repeated themselves yet.

> > > Sun Mar 20 04:04:01 2011 VAttachVolume: Error reading diskDataHandle header for vol 536870916; error=101
> > > Sun Mar 20 04:04:01 2011 Scheduling salvage for volume 536870916 on part /vicepa over SALVSYNC
> > 
> > "volume metadata is corrupt for 536870916"
> 
> It's a RO clone, should I just delete it and add it back?

Yes, if it'll let you. The salvage is supposed to delete it for you, but
if it's not it's not.

> Does this look normal?  I.e., a whole lot of zero-length vnodes
> attached to the same vice file.

Those are fine; vnodes like this:

> 15040 Vnode 472.0.0 cloned: 0, length: 0 linkCount: 0 parent: 0 UFS-Filename: /vicepa/AFSIDat/d=/d3++U/+/+/+

That is, vnodes like "Vnode X.0.0", are 'blank'; the vnode entry is just
zeroed out because the indices are expanded by some amount at a time in
chunks (512, or something?).

Based on the salsrv logs, I'd be looking at vnodes 264, 266, and 268 in
536870915/6, and 22, 202, and 204 in 536871273/4. That is, see if the
'UFS-Filename' file on disk actually has the length in the 'volinfo'
output.

And by the way:

>>>Shortly after the weekly scheduled fileserver restart

You shouldn't need to have these on. But they shouldn't be causing
salvages or whatever, either.

-- 
Andrew Deason
adeason@sinenomine.net