[OpenAFS] AFS hangs, possible nat issues?
Steve Simmons
scs@umich.edu
Fri, 14 May 2010 15:11:13 -0400
We've got a user here who's behind a firewall/nat and having some =
problems. The firewall is likely an aggressive, as it protects a =
hospital. He's running the stock Ubuntu 9.10 client, which he reports as =
being oafs version 1.4.11.
The symptom is that he gets periodic hangs when accessing files in afs. =
During those times his /var/log/messages shows a lot of sequences like =
this:
May 14 11:33:57 minime kernel: [353065.752034] afs: Lost contact with
file server 141.211.1.127 in cell umich.edu (all multi-homed ip
addresses down for the server)
May 14 11:33:57 minime kernel: [353065.752039] afs: Lost contact with
file server 141.211.1.127 in cell umich.edu (all multi-homed ip
addresses down for the server)
May 14 11:34:13 minime kernel: [353081.773810] afs: file server
141.211.1.127 in cell umich.edu is back up (multi-homed address; other
same-host interfaces may still be down)
May 14 11:34:13 minime kernel: [353081.773815] afs: file server
141.211.1.127 in cell umich.edu is back up (multi-homed address; other
same-host interfaces may still be down)
On the server side we see msgs like:
Fri May 14 11:33:08 2010 CB: ProbeUuid for <addr>:<port> failed -01
where the IP address is the firewall and the port number is not a =
standard afs port. Port number also varies all over the map. Time =
correspondence is pretty strong.
At this point I'm guessing that the nat box is dropping the mapping =
between internal and external UDP ports. The 1.5.73 release notes =
mention this issue, saying they add UDP keepalive for just that reason.
Next step would be to have him try 1.5.74, but before he goes that far =
I'd be interested in anyone who's seen similar problems and what if =
anything fixed them.
Steve=