[OpenAFS] Mount point weirdness: fs lsm X, fs lq X return different volumes for same mount point.
Fri, 3 Oct 2008 12:56:18 -0400
If you have tcpdump data for cache manager <-> vlserver and cache
manager <-> fileserver traffic during one of these corruptions, that
could be very helpful. I've found tcpdump (or wireshark/tshark) to be
useful in tracking down issues like this because you can very quickly
see if the problem is
1- cache manager asking for the wrong thing to start with (possibly
cache corruption -- not conclusive because you have to determine if
the cache manager got the bad data and cached it, or if the cache
manager 'broke' the data; picking one client and clearing it's cache,
then re-trying can help answer that question). Note that in your
case, this is pretty unlikely, given that you saw it across multiple
clients on mutiple OSes.
2- vlserver giving a wrong answer
3- neither of the above, which means the fileserver is giving a wrong answer.
The usual suspects (e.g., cmdebug) are also helpful here. It might
also be useful to get the callback state from the fileservers to see
what they think the cache managers have for data (if in case 3 above).
Given that 'failed volume moves' seem to have been a trigger for
this, logfiles might have something interesting, especially if you can
provide volume names & volume id's for the X-volumes'
End Point Corporation