[OpenAFS-devel] CopyOnWrite failure
Marco Foglia
marco.foglia@psi.ch
Tue, 19 Mar 2002 14:24:58 +0100
Ted Anderson wrote:
> Srikanth has found a nasty bug in vol/ihandle.c:ih_reallyclose() when it
> is called on an inode handle (ih) with multiple file descriptors (fd)
> one of which is INUSE. The bug corrupts the threads linking the ih and
> the fd's it caches. It is easy to see how the same fd could be on the
> list of two ih's and lead to asserts in both ih_open and fd_close. In
> the process you might have I/O going to the wrong file. So this could
> explain several of these file server problems.
>
> Basically, the loop at the top of ih_reallyclose doesn't remove the fd
> from the ih list when adding it to the list to be closed. If
> IH_REALLY_CLOSED is set (namely, some fh are INUSE), then the fd's are
> added to the fdAvail list but also left on the ih list.
Hello,
I tried your patch to vol/ihandle.c and the volume corruption happened
again this morning:
> @(#) OpenAFS 1.2.3 built 2002-03-15
> 03/19/2002 11:31:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepa 536875505 -nowrite -showlog)
> 03/19/2002 11:31:21 CHECKING CLONED VOLUME 536875507(READONLY mode).
> 03/19/2002 11:31:21 user.meier.backup (536875507) it would have been updated 03/19/2002 00:03
> 03/19/2002 11:31:22 SALVAGING VOLUME 536875505(READONLY mode).
> 03/19/2002 11:31:22 user.meier (536875505) it would have been updated 03/19/2002 11:29
> 03/19/2002 11:31:22 Vnode 14318: version < inode version; fixed (old status)
> 03/19/2002 11:31:22 Vnode 1: length incorrect; changed from 6144 to 0
> 03/19/2002 11:31:22 Vnode 1003: length incorrect; changed from 2048 to 0
> 03/19/2002 11:31:22 Vnode 1007: length incorrect; changed from 2048 to 0
> ...
> ...
> 03/19/2002 11:31:22 Directory bad, vnode 1491; skipping...
> 03/19/2002 11:31:22 Vnode 1: link count incorrect (was 42, would have changed to -1)
> 03/19/2002 11:31:22 Found 11238 orphaned files and directories (approx. 564868 KB)
> 03/19/2002 11:31:22 It would have Salvaged user.meier (536875505): 11615 files, 565478 blocks
> 03/19/2002 11:31:22 SALVAGING OF PARTITION /vicepa (READONLY mode) COMPLETED
The kernel version was 2.2.19-6.2.12smp (redhat) and the normal
multithreaded afs fileserver was used.
If you look at Vnode 1 (=++++2) of this volume on /vicepa you can
see that the copy (=+++22) has length = 0.
# cd /vicepa/AFSIDat/l1/l5=+U/+/+
# ls -l =+++*
---------- 1 12433 root 6144 Mar 18 19:09 =++++2
---------x 1 12494 root 0 Mar 19 09:00 =+++22
The last modification time of =+++22 is 09:00 and that is
the time when the user of this volume started working.
Marco
--
Marco Foglia | Paul Scherrer Institut | phone +41 56 310 36 39
| CH-5232 Villigen | mailto:marco.foglia@psi.ch
-------------------------------------------------------------------