[OpenAFS] Mount point weirdness: fs lsm X, fs lq X return different volumes for same mount point.

Fri, 03 Oct 2008 10:46:27 -0600

Derrick Brashear wrote:
> I don't see fs checkv before the attempted fix. Was it?
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>   
Hi Derrick,

Sorry to delay in response.  A wooky ripped my left shoulder out of the 
socket (IOW I flipped a recumbent trike) and I was getting the good news 
(it can be scoped) and catching up on my 401-keg plan,

Anyway, we used fs checkv like it was going out of style.

The first fix, which I don't believe I mentioned, was to dump/restore 
the volume containing the mount point, and the volume mounted, to force 
new volIDs.  This worked, resolving the issue for all the various client 
types and versions, simultaneously, for the dumped/restored volumes.  I 
don't believe dump/restore of the parent was necessary but didn't try 
independently.

I figured this would work as there was obviously some confusion about 
volIDs/names and I wasn't sure where the confusion might be.

Figuring it might be an oddity with the file servers, I bos restarted 
all of them.  I chose not to reboot as the critical volumes were 'back 
in place.'

The bos restart did not help, and I continued to see the same disparity 
between 'fs lsm' and 'fs lq' on about forty mount points.

Since these were mostly disused volumes, I experimented with other 
approaches.

I tried salvaging the vol.ume containing the disobeyed mount point; no joy.

vos syncvl/syncser no  joy.

Fairly reliable, and it may have been just the second step that bailed 
me out:
    1. Remove mount point from volume A
    2. vos release volume A
    3. vos addsite newsite volume B (the mounted volume)
    4. vos release volumeB

Note that all of these volumes were replicated, both the volumes A and 
volumes B.

Note also that due to volume moves that did not fully complete there was 
an orphan for most if not all of the volumes B.  These moves were within 
a few hours of the incident.  I can get precise move times from logs if 
of interest.

Cleaning up the orphans was done concomitant with other efforts and may 
have had impact.

I will look for additional existing instances today, and if I find any 
will not fix -- perhaps we can use for diagnostic.  (So of course now I 
regret fixing those volumes I fixed.  I love computing! )

Thanks.

Kim