[OpenAFS] dot path times out after root.cell was moved to a different fileserver

Marc Schmitt mschmitt@inf.ethz.ch
Tue, 04 Nov 2003 15:08:56 +0100


Hi Horst,

Horst Birthelmer wrote:

>Hi,
>
>AFAIK you cannot remove the AFS kernel module on Linux. it always crashed 
>on my machines.
>
Hmm, I have not seen that problem for a long time. It used to happen 
regularly about two years ago, when OpenAFS was just born.

>
>I think your problem is some inconsistency of the VLDB.
>
We did check again, the VLDB appears to be consistent.

I've just received a call from one of our AFS admins, he said that he 
had found a Solaris machine, running OpenAFS, that showed the same 
problem as I'm experiencing. He suggested issuing 'fs checkvol' on the 
clients and it worked, I could access the dot path after that.

Looks like the following happens:
- client accessed dot path before Sunday -> client caches volume origine 
to be fileserver X
- root.cell was moved to Y on Sunday
- clients that had accessed root.cell before Sunday time out on dot path 
after Sunday
- clients that have not accessed root.cell since they have been booted 
can access root.cell, they do not know about X having been a fileserver
- 'fs checkvol' helps clients that are stuck to get access to the dot 
path again

What I don't understand is why this didn't work "out of the box". Moving 
volumes from one server to another is not reallly an atypcal AFS 
operation. :)
Or is root.cell special in that sense?

    Marc