[OpenAFS] ProbeUuid failed for host xxx.xxx.xxx.xxx:7001

Lester Barrows barrows@email.arc.nasa.gov
Wed, 30 Mar 2005 11:56:01 -0800


It seems I was barking up the wrong tree with the previous error, which 
confused the issue. The ProbeUuid error may have more to do with the problem. 
Perhaps a better, more complete description (with ideally no ambiguity) is in 
order.

- Occasionally when many small files are transferred quickly onto a volume, 
the server containing the volume will time out on one or more clients. These 
clients will no longer be able to access the server.

- A "Connection timed out" error is shown in a terminal session on an affected 
client when attempting to access a volume from the affected server, which has 
now become inaccessable.

- When a client can no longer access the affected server, the following entry 
comes up for the client system in the affected server's FileLog:

ProbeUuid failed for host xxx.xxx.xxx.xxx:7001

- Typing 'fs checkserver' on the affected client produces the following error:

These servers unavailable due to network or server problems:  [affected server 
hostname]

- Some other clients are able to access the server. I believe that this may be 
due to the unaffected clients not accessing the volumes which were under 
heavy use.

- Shutting down the AFS client and ensuring that the kernel module is removed, 
then restarting the AFS client does not allow the affected client to access 
the affected server.

- Re-starting the fs, volserver, ptsserver services on the affected server 
alone does not allow the affected client to access the server. Shutting down 
and then restarting the AFS service completely on the affected server also 
has no effect on the affected client.

- Rebooting the affected client computer does allow it to access the affected 
server.

- The servers are running OpenAFS 1.2.13, the affected client in this case is 
also running 1.2.13. Older clients have also shown this behavior in the past.

- The firewall allows traffic initiated by the client, which tends to work. 
This issue tends to happen every few months.

The affected system at this point is my workstation, and the affected server 
does not contain volumes which I need to access directly. Thus, I'm willing 
to keep it online until I can determine the cause of the issue. Does this 
issue sound familiar to anyone?

Regards,

Lester Barrows
Asani Solutions, LLC
Code TI Systems Group
NASA Ames Research Center