[OpenAFS] Quick "Connection timed out" (client OpenAFS 1.6.0-1-debian, server SELinux OpenAFS 1.6.1)

John Tang Boyland boyland@uwm.edu
Fri, 17 May 2013 16:03:57 -0500


I've had occasional problems where I get a very quick "connection timed out"
when trying to write files.  Here's the symptoms:

$ tar xvf /tmp/cs854.mail.tar 2>&1 | more
./
./1
./.mh_sequences
./2
./3
./4
./5
...
./295
tar: ./295: Cannot close: Connection timed out
./296
tar: ./296: Cannot open: File exists
./297
tar: ./297: Cannot open: Connection timed out
./298
tar: ./298: Cannot open: Connection timed out
... etc more of the same
 ./606
tar: ./606: Cannot open: Connection timed out
./607
tar: ./607: Cannot open: Connection timed out
tar: .: Cannot stat: Connection timed out
tar: Exiting with failure status due to previous errors

(The "File Exists" is partly because I tried touching 296 and 297
to see if that would avoid the connection timed out message).

Several interesting aspects:
(1) The scenario is utterly repeatable.  It always dies on 295.
(2) The "connection timed out" happens without any delay --
    the timeout must be a fraction of a second
(3) The FileLog for the fileserver has no relevant problems:

...
Fri May 17 10:42:30 2013 CB: RCallBackConnectBack (host.c) failed for host 24.209.180.128:7001
Fri May 17 10:45:29 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55018) failed -1
Fri May 17 10:49:30 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55040) failed -1
Fri May 17 10:53:31 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55009) failed -1
Fri May 17 10:57:32 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55011) failed -1
Fri May 17 11:01:33 2013 CB: ProbeUuid for host 083D6820 (24.131.85.141:55063) failed -1

The server and client are both on 129.xx.xx.xx

server is running (I think) SELinux
client is running Ubuntu

$ uname -a
Linux pabst 3.0.0-14-generic #23somerville3-Ubuntu SMP Mon Dec 12 09:20:18 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
$ rxdebug localhost -port 7001 -version
Trying 127.0.0.1 (port 7001):
AFS version:  OpenAFS 1.6.0-1-debian built  2013-03-21 
$ rxdebug <fileserver> -port 7000 -version
Trying 129.xx.xx.xx (port 7000):
AFS version:  OpenAFS 1.6.1 built 2013-03-01 (114.sl6@fnal.gov)

(Since I composed the mail part of this email, I have
dicovered the same repeatable problem when trying to clone
a git repository.  Both cases, then, happen when the
file server is being asked to write many many files in
quick succession.)

Given that this is a repeatable problem
and assuming it's a known bug, I'd be happy to run some diagnostics.
If you send me email, you may get some strange messages.
Despite messages about mail being rejected, I
still get it, just not at the most useful place.

Best regards,
John Boyland