[OpenAFS] Mount point weirdness: fs lsm X, fs lq X return different
volumes for same mount point.
Kim Kimball
dhk@ccre.com
Fri, 03 Oct 2008 10:46:27 -0600
Derrick Brashear wrote:
> I don't see fs checkv before the attempted fix. Was it?
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>
Hi Derrick,
Sorry to delay in response. A wooky ripped my left shoulder out of the
socket (IOW I flipped a recumbent trike) and I was getting the good news
(it can be scoped) and catching up on my 401-keg plan,
Anyway, we used fs checkv like it was going out of style.
The first fix, which I don't believe I mentioned, was to dump/restore
the volume containing the mount point, and the volume mounted, to force
new volIDs. This worked, resolving the issue for all the various client
types and versions, simultaneously, for the dumped/restored volumes. I
don't believe dump/restore of the parent was necessary but didn't try
independently.
I figured this would work as there was obviously some confusion about
volIDs/names and I wasn't sure where the confusion might be.
Figuring it might be an oddity with the file servers, I bos restarted
all of them. I chose not to reboot as the critical volumes were 'back
in place.'
The bos restart did not help, and I continued to see the same disparity
between 'fs lsm' and 'fs lq' on about forty mount points.
Since these were mostly disused volumes, I experimented with other
approaches.
I tried salvaging the vol.ume containing the disobeyed mount point; no joy.
vos syncvl/syncser no joy.
Fairly reliable, and it may have been just the second step that bailed
me out:
1. Remove mount point from volume A
2. vos release volume A
3. vos addsite newsite volume B (the mounted volume)
4. vos release volumeB
Note that all of these volumes were replicated, both the volumes A and
volumes B.
Note also that due to volume moves that did not fully complete there was
an orphan for most if not all of the volumes B. These moves were within
a few hours of the incident. I can get precise move times from logs if
of interest.
Cleaning up the orphans was done concomitant with other efforts and may
have had impact.
I will look for additional existing instances today, and if I find any
will not fix -- perhaps we can use for diagnostic. (So of course now I
regret fixing those volumes I fixed. I love computing! )
Thanks.
Kim