[OpenAFS] Mount point weirdness: fs lsm X, fs lq X return different volumes for same mount point.

Kim Kimball dhk@ccre.com
Mon, 29 Sep 2008 16:39:12 -0600


Had a weird one on Thursday, and am looking for any plausible 
explanation so I can close out the incident report. 

My best answer right now is NAFC (not an effing clue.)

I'm using "X-mounted" to describe "volume named in mountpoint" not equal 
to "volume accessed at mountpoint"

Probably relevant:  We were moving volumes to clear a file server, and 
noticed an unusual number of orphaned volumes. 

When I went to start 'vos zapping' the orphans, many of them  turned out 
to be those that incorrectly showed up at a given mount point.

Could it be that the 'vos move' failures that created the orphans are 
the proximate cause of the X-mounts?  If so, how could the two be related?

Any FC greatly appreciated.

Kim

====================================
Synopsis:

 From any AFS client, the volume named in a mount point was not the 
volume actually accessed


Initial symptom:
       web servers start puking when invoking perl modules
       cd to path where perl modules are expected, and instead of perl 
modules see bunch of unrelated png libraries
       check mount point to volume containing perl modules, and mount 
point correctly names perl volume
       fs lq on mount point returns name of volume containing png 
libraries -- not the name of the volume specified in fs lsm

The diagnostic:
    fs lsm <path/mountpoint>   --> volumeA
    fs lq   <path/mountpoint>   --> volumeZ

Confirmation:
    cd <path/mountpoint>
    ls
            ----- returns list of files/directories stored in volumeZ

The mount point is correct; that is, fs lsm returns the expected volume 
name.
The volume accessed at the mount point is incorrect.
The files/directories in the incorrectly accessed volume are correct.

-------------------------------
We turned up forty plus instances of  X-mounted (for lack of a better 
word) volumes.

The fix:
    remove the mount point
    release the volume (containing mount point)
    create same mount point
    release volume again

    vos addsite newserver newpart _mounted_ volume (as named in mount point)
    vos release _mounted_ volume
   
    fs checkv

Then get expected responses.
        fs lsm <path/mountpoint>   --> volumeA
        fs lq   <path/mountpoint>   --> volumeA

========================

Other efforts:

I did restart the fs instances on all file servers, suspecting some sort 
of off-by-one'ish glitch in some unknown index/table/?

The restarts had no impact.

'vos move" of the volume containing the mount point did not help.

-----------------