[OpenAFS] Re: Crazy DAFS problem (with log)

Andrew Deason adeason@sinenomine.net
Mon, 21 Mar 2011 14:05:58 -0500


On Sun, 20 Mar 2011 22:54:38 -0500
"Ryan C. Underwood" <nemesis-lists@icequake.net> wrote:

> > Do you know if this volume was recreated between these times? I'm
> > not sure if the salvager is just lying, but this would indicate it
> > deleted the RO, and then it came back again and again has invalid
> > metadata.
> 
> If by recreated you mean a remsite/addsite, the answer is no.

Recreated for an RO would be a release (for a backup it would be a 'vos
backup').

> Long SalsrvLog excerpt:
[...]
> 03/20/2011 04:02:05 SALVAGING VOLUME 536871273.
> 03/20/2011 04:02:05 www_logs (536871273) updated 03/19/2011 18:35
> 03/20/2011 04:02:05 Vnode 22: length incorrect; changed from 1683156 to 0
> 03/20/2011 04:02:05 Vnode 24: length incorrect; changed from 184572 to 0
> 03/20/2011 04:02:05 Vnode 202: length incorrect; changed from 232747 to 0
> 03/20/2011 04:02:05 totalInodes 245
> 03/20/2011 04:02:05 Salvaged www_logs (536871273): 238 files, 241815 blocks
[...]
> 03/20/2011 04:04:13 SALVAGING VOLUME 536871273.
> 03/20/2011 04:04:13 www_logs (536871273) updated 03/19/2011 18:35
> 03/20/2011 04:04:13 totalInodes 242
> 03/20/2011 04:04:13 Salvaged www_logs (536871273): 238 files, 241815 blocks
[...]
> 03/20/2011 04:14:14 SALVAGING VOLUME 536871273.
> 03/20/2011 04:14:14 www_logs (536871273) updated 03/20/2011 04:14
> 03/20/2011 04:14:14 Vnode 22: length incorrect; changed from 1695692 to 0
> 03/20/2011 04:14:14 Vnode 202: length incorrect; changed from 288593 to 0
> 03/20/2011 04:14:14 totalInodes 246
> 03/20/2011 04:14:14 Salvaged www_logs (536871273): 238 files, 242062 blocks
[...]
> 03/20/2011 04:16:02 www_logs (536871273) updated 03/20/2011 04:15
> 03/20/2011 04:16:02 totalInodes 242
> 03/20/2011 04:16:02 Salvaged www_logs (536871273): 238 files, 244003 blocks
[...]
> 03/20/2011 04:20:04 SALVAGING VOLUME 536871273.
> 03/20/2011 04:20:04 www_logs (536871273) updated 03/20/2011 04:20
> 03/20/2011 04:20:04 Vnode 22: length incorrect; changed from 1698459 to 0
> 03/20/2011 04:20:04 Vnode 24: length incorrect; changed from 253718 to 0
> 03/20/2011 04:20:04 Vnode 202: length incorrect; changed from 305092 to 0
> 03/20/2011 04:20:04 totalInodes 245
> 03/20/2011 04:20:04 Salvaged www_logs (536871273): 238 files, 241896 blocks

"what". Lots of stuff like this in there. It doesn't exactly make a lot
of sense; what it _looks_ like is that the vnodes are getting appended
to, and then the on-disk /vicep file is getting truncated. And it's
happening repeatedly, sometimes in the span of a second.

Can you find out any more information about those files? Both in AFS and
in /vicepX. To find out which files they are in AFS, 'fs getfid' can
tell you the vnode of a file, like so:

$ fs getfid .
File . (536870916.1.1) contained in volume 536870916

which is vnode 1. Vnodes 22, 24, and 202 will all be regular files (or
mountpoints, symlinks, etc, but they look like they'd be regular files).

To get the on-disk /vicepX file, you can get that from 'volinfo /vicepX
536871273 -filename'. Knowing the size of the /vicepX file would be
nice, and so would identifying what they are in AFS and what their
access pattern is (if you know it).

And, of course, like Derrick said, it would be good to see what happens
when HAVE_PIO is disabled. It _could_ be something like ihandle having a
file handle open to the wrong file, I think.

-- 
Andrew Deason
adeason@sinenomine.net