[OpenAFS-devel] CopyOnWrite failures continue still
hoffman@cs.pitt.edu
hoffman@cs.pitt.edu
Fri, 29 Mar 2002 09:51:33 -0500 (EST)
Marcus,
Thanks for your detailed response.
> (1) grep for EIO in the kernel source
Did that -- only 2,440 instances. :-)
I think this one was answered by Chaskiel Grundman yesterday.
The EIO errors that I was seeing were happening only when the corrupted
volume did NOT cause a CopyOnWrite error. The CopyOnWrite errors have
never been accompanied by an EIO.
> (2) do a "strace" on the fileserver process
> (3) The AFS source at least used to come with standalone utilities
> that would poke at the filesystem directly
I will try these.
> (4) try doing a "vos dump" for the affected volume, see what you get,
Here's what I got:
[root@spot]# vos dump -id 536878347 -file /tmp/536878347.dump \
-server spot -partition /vicepc -verbose
Could not start transaction on the volume 536878347 to be dumped
Volume needs to be salvaged
Error in vos dump command.
Volume needs to be salvaged
[root@spot#
VolserLog reports:
Fri Mar 29 09:35:21 2002 VAttachVolume: volume salvage flag is ON for
/vicepc/V0536878347.vol; volume needs salvage
> (5) try running the salvager on the affected volume(s), see
> the salvager sees.
This is what we always do; the log of yesterday's salvage is in the
mbell.log file in my public FTP area.
> (6) If strace doesn't show EIO coming back from kernel-land,
> then another avenue to investigate is libc -- is
> there something in there that could be returning EIO
> to the fileserver (there shouldn't be, not for this,
> but you never know...) Also check for anything in
> the AFS libraries proper that might just happen to
> return this.
And this is precisely what Chaskiel found, in viced/physio.c.
Thanks,
---Bob.