[OpenAFS-devel] CopyOnWrite failure

Ted Anderson ota@transarc.com
Thu, 14 Mar 2002 15:13:33 -0500 (EST)


Srikanth has found a nasty bug in vol/ihandle.c:ih_reallyclose() when it
is called on an inode handle (ih) with multiple file descriptors (fd)
one of which is INUSE.  The bug corrupts the threads linking the ih and
the fd's it caches.  It is easy to see how the same fd could be on the
list of two ih's and lead to asserts in both ih_open and fd_close.  In
the process you might have I/O going to the wrong file.  So this could
explain several of these file server problems.

Basically, the loop at the top of ih_reallyclose doesn't remove the fd
from the ih list when adding it to the list to be closed.  If
IH_REALLY_CLOSED is set (namely, some fh are INUSE), then the fd's are
added to the fdAvail list but also left on the ih list.

Below is a quick patch; we'll get a cleaned up fix out in a few days.

Ted Anderson

*** c:/docume~1/ota/desktop/work/afs/ihandle.c	Thu Mar 14 15:05:31 2002
--- f:/afs/OpenAFS/openafs-1.2.3/src/vol/ihandle.c	Fri Oct 12 23:22:10 2001
***************
*** 684,695 ****
  
      if (!(ihP->ih_flags & IH_REALLY_CLOSED))
          DLL_INIT_LIST(ihP->ih_fdhead, ihP->ih_fdtail);
-     else {
-         for (fdP = head; fdP != NULL ; fdP = fdP->fd_next) {
-             DLL_DELETE(fdP, fdP->fd_ih->ih_fdhead, fdP->fd_ih->ih_fdtail,
-                 fd_ihnext, fd_ihprev);
-         }
-     }
  
      if (head == NULL) {
          IH_UNLOCK
--- 685,690 ----
***************
*** 700,706 ****
       * Close the file descriptors
       */
      closeCount = 0;
!     for (fdP = head ; fdP != NULL ; fdP = fdP->fd_next) {
          IH_UNLOCK
          OS_CLOSE(fdP->fd_fd);
          IH_LOCK
--- 695,701 ----
       * Close the file descriptors
       */
      closeCount = 0;
!     for (fdP = head ; fdP != NULL ; fdP = fdP->fd_ihnext) {
          IH_UNLOCK
          OS_CLOSE(fdP->fd_fd);
          IH_LOCK