[OpenAFS] Clients are blocked with error code -3 of RXAFSCB_ProbeUuid

Benjamin Kaduk kaduk@mit.edu
Fri, 1 May 2020 23:01:26 -0700

On Tue, Apr 28, 2020 at 10:30:50AM +0800, huangql wrote:
> Hello Ben,
> Thank you for your reply.
> Actually, our farm experiences this issue for some time. And we spent a lot of time to figure out it. We found when there is large IO throughput to consume the network bandwidth and there are many  network package losts, the issue is more serious.  After we configured a separate network interface for client machines in NetInfo file. This symptom changed better. But the issue still exists.
> But we all think it does not processed well in this case. The client should not be blocked rather than report "timeout" and exit.
> The openafs version we used listed below:
> Sever side: OpenAFS 1.6.11
> Client side: Openafs-1.6.23
> Any comments or suggestions will be grateful.

Are you in a position to try a new major version on either or both
endpoints, or to rebuild with minor patches to the current tree?

I believe that the 1.8.x series has some fileserver performance
improvements that might be relevant, or one could try increasing the
constants used for various (e.g., "idle dead") timeouts in the client.