[OpenAFS] Weirdness after 'vos move's - core files?

Garance A Drosehn drosih@rpi.edu
Sun, 19 Feb 2017 11:49:40 -0500

I'm not sure this mailing list is still working, but if it is, here is
some more info on some weird issues I'm seeing on one of our file servers
after upgrading to openafs 1.6.20-1.  Note that I upgraded three other
file servers to openafs 1.6.20-1 without any trouble.

To recap, I first moved the busiest volumes off the file server (in case
the upgrade would take a long time).  I then installed a new redhat kernel
and the new version of openafs as compiled for that kernel (which was done
on a different file server).  This is the same process I've followed on
other file servers.

I then went to move volumes back to the now-upgraded file server.  26
volumes transferred fine, and then the 27th failed.  So did three other
volumes that I tried to move back.  I then picked up *all* the remaining
volumes (including the four that failed), and successfully moved them to a
different file server which had already been upgraded to openafs 1.6.20-1.
And that went without any problems.

On the file server which I couldn't move volumes to, a 'listvol' command
showed four "**** Could not attach volume <vid> ****" messages, one for
each volume that had failed.  I tried doing a "vos syncvldb" for each
one those volumes on all of our file servers, followed by a "vos syncserv"
on all our file servers.  This didn't get rid of the "Could not attach"

I rebooted the file server, which also didn't change anything.  I then
did a "bos salvage", which took 30 minutes and finished okay.  After it
finished, the "Could not attach" messages were gone from "listvol".  So
I tried to "vos move" another volume from a different file server to the
problematic one.  And this "vos move" failed again, leaving behind two
core files in /usr/afs/logs.  I'm doing another "bos salvage" command,
just because those "Could not attach" messages annoy me.

Is there something I could do with those core files which would help to
figure out what the problem is with this file server?  I also have
plenty of log files, if those would provide some clues.

Garance Alistair Drosehn                =     drosih@rpi.edu
Senior Systems Programmer               or   gad@FreeBSD.org
Rensselaer Polytechnic Institute;             Troy, NY;  USA