[OpenAFS] Another bug (?) in openafs-1.6.0rc2 on Scientific Linux 6: backup volume corrupted

John Tang Boyland boyland@uwm.edu
Mon, 10 Oct 2011 15:04:56 -0500


Dear OpenAFS,

   I've noticed another serious problem with openafs-1.6.0rc2 on
Scientific Linux 6.  I have a reprieve of a couple of hours this afternoon and
will try to build an RPM by scratch for openafs-server using directions
kind people have sent in the past, so if this is a known bug, I hope to
be rid of it, but for the record:

Another student's AFS volume was hit with a client/server bug (?) that
zero'ed out his homework file.  When I went back to the backups, I
discovered (coincidentally?) that two days after the file got deleted,
the backup volume had serious troubles.  This was just after the
filserver crashed as I related in a previous email.

When it tried to salvage this student's backup volume there were problems:
(student id replaced with XXX to avoid login name appearing in email
archive).

10/05/2011 17:39:00 SALVAGING VOLUME 536876336.
10/05/2011 17:39:00 fa11.cs351.XXX (536876336) updated 10/03/2011 22:02
10/05/2011 17:39:00 Vnode 2: version < inode version; fixed (old status)
10/05/2011 17:39:00 Vnode 4: version < inode version; fixed (old status)
10/05/2011 17:39:00 Vnode 156: length incorrect; changed from 20131 to 0
10/05/2011 17:39:00 Vnode 160: length incorrect; changed from 8950 to 0
10/05/2011 17:39:00 Vnode 174: length incorrect; changed from 2159 to 0
10/05/2011 17:39:00 Vnode 178: length incorrect; changed from 2536 to 0
10/05/2011 17:39:00 totalInodes 182
10/05/2011 17:39:00 Salvaged fa11.cs351.XXX (536876336): 175 files, 468 blocks
10/05/2011 17:39:00 dispatching child to salvage volume 536876338...
10/05/2011 17:39:00 dispatching child to salvage volume 536876336...
10/05/2011 17:39:00 namei_ListAFSSubDirs: warning: VG 536876338 does not have a link table; salvager will recreate it.
10/05/2011 17:39:00 fileserver requested salvage of clone 536876338; scheduling salvage of volume group 536876336...
10/05/2011 17:39:00 1 nVolumesInInodeFile 28
10/05/2011 17:39:00 SALVAGING VOLUME 536876336.
10/05/2011 17:39:00 fa11.cs351.XXX (536876336) updated 10/03/2011 22:02
10/05/2011 17:39:00 totalInodes 179
10/05/2011 17:39:00 Salvaged fa11.cs351.XXX (536876336): 175 files, 468 blocks
10/05/2011 17:39:00 The volume header file /vicepa/V0536876338.vol is not associated with any actual data (deleted)
10/05/2011 17:41:01 dispatching child to salvage volume 536876188...
10/05/2011 17:41:01 2 nVolumesInInodeFile 56

The line about deleting the volume header file is disturbing.

Then when we next try to reclone the volume (vos backupsys at 1am
automatically):

Thu Oct  6 01:02:24 2011 VReadVolumeDiskHeader: Couldn't open header for volume 536876338 (errno 2).
Thu Oct  6 01:02:24 2011 1 Volser: Clone: Cloning volume 536876336 to new volume 536876338
Thu Oct  6 01:02:24 2011 SYNC_ask: negative response on circuit 'FSSYNC'
Thu Oct  6 01:02:24 2011 FSYNC_askfs: FSSYNC request denied for reason=65547
Thu Oct  6 01:02:24 2011 SYNC_ask: negative response on circuit 'FSSYNC'
Thu Oct  6 01:02:24 2011 FSYNC_askfs: FSSYNC request denied for reason=0
Thu Oct  6 01:02:24 2011 VAttachVolume: attach of volume 536876338 apparently denied by file server

Then the next day:

Fri Oct  7 01:01:58 2011 1 Volser: Clone: Recloning volume 536876287 to volume 536876289
Fri Oct  7 01:01:58 2011 SYNC_ask: negative response on circuit 'FSSYNC'
Fri Oct  7 01:01:58 2011 FSYNC_askfs: FSSYNC request denied for reason=0
Fri Oct  7 01:01:58 2011 VAttachVolume: attach of volume 536876338 apparently denied by file server
Fri Oct  7 01:01:58 2011 VCreateVolume: Header file /vicepa/V0536876338.vol already exists!
Fri Oct  7 01:01:58 2011 1 Volser: Clone: Couldn't create new volume; clone aborted

Of course I only noticed today...  (That's the problem when a professor
tries to administer a cell by himself.)

Actually, this seems like the same bug that caused the backup error
mentioned a while ago -- the backup volume is partially destroyed but
not completely.

As I said, I will try to build the RPMs from scratch.  Fortunately I
only need the openafs-server binaries, not the kernel modules.

Best regards,
John