[OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

Ken Elkabany Ken@Elkabany.com
Sun, 10 May 2009 18:53:27 -0700


Hello,

I have openafs 1.4.9 client and server running on two separate
machines across a WAN. The client has scripts that access the
/afs/our.cell/ directory. Occasionally, the script will fail to
complete, and the logs will say that the "Connection Timed Out" on a
"mkdir -p /afs/our.cell/x/y/z" command. The frequency of the errors
are approximately 1 in 100, small enough to not be easily reproducible
manually, but enough to hamper our project. The scripts run as the
root user, and is guaranteed to have the proper ticket and token. It's
also important to note that these scripts often run in parallel (4 at
a time, all root, modifying our cell). When one fails, all scripts
running concurrently will fail with the same error, and I typically
either unlog;kdestroy or restart the openafs-client (I am unsure which
of those solutions is necessary or sufficient). I will soon have an
additional LAN setup, and will determine if the same error occurs. Has
anyone dealt with this issue before?

Thank you for the assistance,

Ken