[OpenAFS] 1.4.1-rc4 volumes detaching

Terry McCoy terry@nd.edu
Wed, 18 Jan 2006 19:25:12 -0500 (EST)


We are running 1.4.1-rc4 on Solaris 8.

The file server binary has the patch "h-enumerate-r-fix-for-real-20060116"
that was created for src/viced/host.c (revision 1.57.2.8)


  ****
  ****
     It appears this patch has solved our problem with the fileserver process
     crashing and dumping core.

     [grand.central.org #24196] More 1.4.1-rc2  file server core dumps
  ****
  ****


Over the last 24 hours we have had about six volumes become detached
for no apparent reason.

Here are messages from two file servers from the VolserLog and the SalvageLog


bubba> tail -10 /usr/afs/logs/VolserLog
Wed Jan 18 02:23:06 2006 <LocalAuth> is executing Reclone Volume 2039009069
Wed Jan 18 02:23:06 2006 1 Volser: Clone: Recloning volume 2039009067 to volume 2039009069
Wed Jan 18 02:23:06 2006 <LocalAuth> is executing Reclone Volume 2039009075
Wed Jan 18 02:23:06 2006 1 Volser: Clone: Recloning volume 2039009073 to volume 2039009075
Wed Jan 18 02:23:06 2006 <LocalAuth> is executing Reclone Volume 2039117564
Wed Jan 18 02:23:06 2006 1 Volser: Clone: Recloning volume 2039117562 to volume 2039117564
Wed Jan 18 02:23:06 2006 <LocalAuth> is executing Reclone Volume 2039117567
Wed Jan 18 02:23:06 2006 1 Volser: Clone: Recloning volume 2039117565 to volume 2039117567
Wed Jan 18 12:33:48 2006 VAttachVolume: volume salvage flag is ON for /vicepa/V2038040296.vol; volume needs salvage
Wed Jan 18 12:33:48 2006 1 Volser: ListVolumes: Could not attach volume 2038040296 (/vicepa:V2038040296.vol), error=101


bubba> more /usr/afs/logs/SalvageLog
@(#) OpenAFS 1.4.1-rc4 built  2006-01-13
01/18/2006 12:37:04 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepa 2038040296)
01/18/2006 12:37:15 Scanning inodes on device /dev/rdsk/c0t0d0s5...
01/18/2006 12:40:47 2 nVolumesInInodeFile 56
01/18/2006 12:40:51 CHECKING CLONED VOLUME 2038040298.
01/18/2006 12:40:51 user.purchsng.backup (2038040298) updated 01/17/2006 17:31
01/18/2006 12:40:51 SALVAGING VOLUME 2038040296.
01/18/2006 12:40:51 user.purchsng (2038040296) updated 01/18/2006 11:40
01/18/2006 12:40:51 Vnode 27280: version < inode version; fixed (old status)
01/18/2006 12:40:51 totalInodes 15341
01/18/2006 12:40:53 Vnode 2971: link count incorrect (was 7, now 6)
01/18/2006 12:40:53 Found 1 orphaned files and directories (approx. 2 KB)
01/18/2006 12:40:53 Salvaged user.purchsng (2038040296): 15323 files, 7581843 blocks


corvett> date ; vos ex user.purchsng
Wed Jan 18 19:13:17 EST 2006
user.purchsng                    2038040296 RW    7581915 K  On-line
    bubba.helios.nd.edu /vicepa
    RWrite 2038040296 ROnly          0 Backup 2038040298
    MaxQuota    8000000 K
    Creation    Fri Apr 28 08:04:30 2000
    Last Update Wed Jan 18 18:19:05 2006
    52865 accesses in the past day (i.e., vnode references)

    RWrite: 2038040296    Backup: 2038040298
    number of sites -> 1
       server bubba.helios.nd.edu partition /vicepa RW Site






reno> tail -10 /usr/afs/logs/VolserLog
Wed Jan 18 01:44:05 2006 <LocalAuth> is executing Reclone Volume 2038941398
Wed Jan 18 01:44:05 2006 1 Volser: Clone: Recloning volume 2038941396 to volume 2038941398
Wed Jan 18 01:44:05 2006 <LocalAuth> is executing Reclone Volume 2038941422
Wed Jan 18 01:44:05 2006 1 Volser: Clone: Recloning volume 2038941420 to volume 2038941422
Wed Jan 18 01:44:05 2006 <LocalAuth> is executing Reclone Volume 2038941736
Wed Jan 18 01:44:05 2006 1 Volser: Clone: Recloning volume 2038941734 to volume 2038941736
Wed Jan 18 01:44:05 2006 <LocalAuth> is executing Reclone Volume 2038944818
Wed Jan 18 01:44:05 2006 1 Volser: Clone: Recloning volume 2038944816 to volume 2038944818
Wed Jan 18 15:17:57 2006 VAttachVolume: volume salvage flag is ON for /vicepb/V2037420501.vol; volume needs salvage
Wed Jan 18 15:17:57 2006 1 Volser: XListVolumes: Could not attach volume 2037420501


reno> more /usr/afs/logs/SalvageLog
@(#) OpenAFS 1.4.1-rc4 built  2006-01-13
01/18/2006 15:39:51 STARTING AFS SALVAGER 2.4 (/usr/afs/bin/salvager /vicepb 2037420501)
01/18/2006 15:40:02 Scanning inodes on device /dev/rdsk/c0t0d1s5...
01/18/2006 15:43:01 2 nVolumesInInodeFile 56
01/18/2006 15:43:01 CHECKING CLONED VOLUME 2037420503.
01/18/2006 15:43:01 sys.coba_1.backup (2037420503) updated 01/17/2006 18:57
01/18/2006 15:43:02 SALVAGING VOLUME 2037420501.
01/18/2006 15:43:02 sys.coba_1 (2037420501) updated 01/18/2006 14:14
01/18/2006 15:43:02 Vnode 3558: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 20684: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 21034: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 21410: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 21430: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 21594: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 21598: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 44790: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 52616: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 58136: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 87148: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 103630: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 105088: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 105090: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 105092: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 105094: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 105096: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 105098: version < inode version; fixed (old status)
01/18/2006 15:43:02 Vnode 106068: version < inode version; fixed (old status)
01/18/2006 15:43:03 totalInodes 79247
01/18/2006 15:43:07 dir vnode 6185: ??/.. (vnode 5295): unique changed from 1 to 628482 -- deleted
01/18/2006 15:43:10 Vnode 5295: link count incorrect (was 2, now 1)
01/18/2006 15:43:10 Found 5 orphaned files and directories (approx. 77 KB)
01/18/2006 15:43:10 Salvaged sys.coba_1 (2037420501): 79169 files, 14498401 blocks


corvett> date ; vos ex sys.coba_1
Wed Jan 18 19:12:16 EST 2006
sys.coba_1                       2037420501 RW   14500887 K  On-line
    reno.helios.nd.edu /vicepb
    RWrite 2037420501 ROnly          0 Backup 2037420503
    MaxQuota   15000000 K
    Creation    Wed Jul 31 10:42:08 1996
    Last Update Wed Jan 18 18:34:05 2006
    1013792 accesses in the past day (i.e., vnode references)

    RWrite: 2037420501    Backup: 2037420503
    number of sites -> 1
       server reno.helios.nd.edu partition /vicepb RW Site






+--    --    --    --    --    --    --    --    --    --    --    --    --   +
|                                                                             |
|  Terry McCoy                                  email:  terry@nd.edu          |
|  Sr Systems Engineer                          phone:  (574) 631-4274        |
|                                                                             |
|  Office of Information Technologies                                         |
|  315 Information Technology Center                                          |
|  University of Notre Dame                                                   |
|  Notre Dame, Indiana  46556                                                 |
|                                                                             |
|                                                                             |
|  perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);' |
|                                                                             |
+--    --    --    --    --    --    --    --    --    --    --    --    --   +