[OpenAFS] dot path times out after root.cell was moved to a different fileserver

Marc Schmitt mschmitt@inf.ethz.ch
Tue, 04 Nov 2003 14:19:23 +0100


Hi all,

We're having problems with our Linux clients after the root.cell volume 
was moved to a different fileserver and, on the same token, that 
fileserver became a DB server only.

Last Sunday, the root.cell volume was moved from server X to server Y. 
Then server X was removed from the list of fileservers and is DB server 
only now. Today, I happened to access the dot path on several Linux 
clients (RedHat 7.3, 2.4.20-20.7, OpenAFS 1.2.10) and see:

ls: /afs/.ethz.ch: Connection timed out

On the console of the clients, I get then:

afs: Lost contact with file server X in cell ethz.ch (all multi-homed ip 
addresses down for the server)

Somehow, the Linux clients still expect root.cell to be on fileserver X, 
which looks like a client bug. Doing a reboot of the clients solves the 
problem, but... Interestingly, I do not see this problem on our Solaris 
machines (running OpenAFS and TransarcAFS).

Is there a way to "purge" X from the clients' fileserver list w/o having 
to reboot them? I tried to restart afsd, but it the kernel module 
appeared to be busy, the service could not be stopped (maybe because it 
waits for X to come back?).

TIA

    Marc