[OpenAFS-devel] CopyOnWrite failure

Marco Foglia marco.foglia@psi.ch
Tue, 19 Mar 2002 14:24:58 +0100


Ted Anderson wrote:
> Srikanth has found a nasty bug in vol/ihandle.c:ih_reallyclose() when it
> is called on an inode handle (ih) with multiple file descriptors (fd)
> one of which is INUSE.  The bug corrupts the threads linking the ih and
> the fd's it caches.  It is easy to see how the same fd could be on the
> list of two ih's and lead to asserts in both ih_open and fd_close.  In
> the process you might have I/O going to the wrong file.  So this could
> explain several of these file server problems.
> 
> Basically, the loop at the top of ih_reallyclose doesn't remove the fd
> from the ih list when adding it to the list to be closed.  If
> IH_REALLY_CLOSED is set (namely, some fh are INUSE), then the fd's are
> added to the fdAvail list but also left on the ih list.

Hello,

I tried your patch to vol/ihandle.c and the volume corruption happened
again this morning: 

> @(#) OpenAFS 1.2.3 built  2002-03-15
> 03/19/2002 11:31:19 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepa 536875505 -nowrite -showlog)
> 03/19/2002 11:31:21 CHECKING CLONED VOLUME 536875507(READONLY mode).
> 03/19/2002 11:31:21 user.meier.backup (536875507) it would have been updated 03/19/2002 00:03
> 03/19/2002 11:31:22 SALVAGING VOLUME 536875505(READONLY mode).
> 03/19/2002 11:31:22 user.meier (536875505) it would have been updated 03/19/2002 11:29
> 03/19/2002 11:31:22 Vnode 14318: version < inode version; fixed (old status)
> 03/19/2002 11:31:22 Vnode 1: length incorrect; changed from 6144 to 0
> 03/19/2002 11:31:22 Vnode 1003: length incorrect; changed from 2048 to 0
> 03/19/2002 11:31:22 Vnode 1007: length incorrect; changed from 2048 to 0
> ...
> ...
> 03/19/2002 11:31:22 Directory bad, vnode 1491; skipping...
> 03/19/2002 11:31:22 Vnode 1: link count incorrect (was 42, would have changed to -1)
> 03/19/2002 11:31:22 Found 11238 orphaned files and directories (approx. 564868 KB)
> 03/19/2002 11:31:22 It would have Salvaged user.meier (536875505): 11615 files, 565478 blocks
> 03/19/2002 11:31:22 SALVAGING OF PARTITION /vicepa (READONLY mode) COMPLETED

The kernel version was 2.2.19-6.2.12smp (redhat) and the normal
multithreaded afs fileserver was used. 

If you look at Vnode 1 (=++++2) of this volume on /vicepa you can 
see that the copy (=+++22) has length = 0. 

# cd /vicepa/AFSIDat/l1/l5=+U/+/+
# ls -l =+++*
----------    1 12433    root         6144 Mar 18 19:09 =++++2
---------x    1 12494    root            0 Mar 19 09:00 =+++22

The last modification time of =+++22 is 09:00 and that is 
the time when the user of this volume started working.

Marco

--
Marco Foglia | Paul Scherrer Institut  | phone     +41 56 310 36 39 
             | CH-5232 Villigen        | mailto:marco.foglia@psi.ch
-------------------------------------------------------------------