[OpenAFS] Security Advisory 2016-003 and 'bos salvage' questions

Garance A Drosehn drosih@rpi.edu
Wed, 15 Feb 2017 13:48:52 -0500


I had an odd situation pop up when upgrading to OpenAFS 1.6.20.1.
The description of the security advisory at
   http://www.openafs.org/pages/security/OPENAFS-SA-2016-003.txt
says:

> We further recommend that administrators salvage all volumes with
> the -salvagedirs option, in order to remove existing leaks.

I'm upgrading our file servers from OpenAFS 1.6.14 to 1.6.20.1 (and
also upgrading our RHEL 7.3 kernel from version 3.10.0-327.el7 to
version 3.10.0-514.6.1.el7 on those machines).

We had some other changes going on as part of this, and I wanted to
minimize how much disruption might occur if something went wrong
with any of the changes.  So I:

   1) created a small-capacity file server with the new OpenAFS
      and newer kernel.
   2) 'vos move'-ed some of the busier non-replicated volumes from
      an existing file server to the new file server.
   3) upgraded that existing file server.
   4) moved volumes back.
 ...  wait a few days, and repeat for a different file server.

There's no step there where I did a 'bos salvage -salvagedirs'.  I
had forgotten about that advice in the advisory.  I did all these
steps for a few file servers with no problems at all.

On my most-recent server, my script which does step #4 moved 26
volumes, and then hit this error on the 27th one:

>    Failed to move data for the volume 53.....75
>       VOLSER: Problems encountered in doing the dump !
>      Recovery:Failed to start transaction on 53.....75
>       Volume needs to be salvaged

Is this simply that 'vos move' in version 1.6.20.1 is doing more
consistency checks than version 1.6.14 did?  The last-update time
for that volume is in Oct 2008, so it's not like it has been
changed recently.  And the new fileserver for temp-storage hasn't
even restarted since I've started this upgrade.  The volume is
still on-line for AFS users, and at the user level it seems to
be in fine shape.  From what little I can tell, I can access all
47,000 files without any errors.

Is it reasonable for me to just do the 'bos salvage' for this
specific volume, then do the 'bos salvage -salvagedirs' for the
entire partition on the temp-space fileserver, and then do that
on the existing (newly-upgraded) fileserver, and then restart
my vos-moves?

Also, can I do the 'bos salvage -salvagedirs' while the 'fs'
process is running, or do I need to stop and restart 'fs'
around that salvage command?

I did a 'bos salvage -help', and see a number of interesting
options are available which I don't see in the man page or the
documentation.  I'm inclined to go with '-orphans ignore' for
a first-run, and then see what is listed for orphans.  I'm also
curious about the '-nowrite' option.  Will that do a thorough
check of what 'salvage' would need to do, but without making
any modifications to the volume?

-- 
Garance Alistair Drosehn                =     drosih@rpi.edu
Senior Systems Programmer               or   gad@FreeBSD.org
Rensselaer Polytechnic Institute;             Troy, NY;  USA