[OpenAFS] AFS hangs, possible nat issues?

Steve Simmons scs@umich.edu
Fri, 14 May 2010 15:11:13 -0400


We've got a user here who's behind a firewall/nat and having some =
problems. The firewall is likely an aggressive, as it protects a =
hospital. He's running the stock Ubuntu 9.10 client, which he reports as =
being oafs version 1.4.11.

The symptom is that he gets periodic hangs when accessing files in afs. =
During those times his /var/log/messages shows a lot of sequences like =
this:

May 14 11:33:57 minime kernel: [353065.752034] afs: Lost contact with
  file server 141.211.1.127 in cell umich.edu (all multi-homed ip
  addresses down for the server)
May 14 11:33:57 minime kernel: [353065.752039] afs: Lost contact with
  file server 141.211.1.127 in cell umich.edu (all multi-homed ip
  addresses down for the server)
May 14 11:34:13 minime kernel: [353081.773810] afs: file server
  141.211.1.127 in cell umich.edu is back up (multi-homed address; other
  same-host interfaces may still be down)
May 14 11:34:13 minime kernel: [353081.773815] afs: file server
  141.211.1.127 in cell umich.edu is back up (multi-homed address; other
  same-host interfaces may still be down)

On the server side we see msgs like:

  Fri May 14 11:33:08 2010 CB: ProbeUuid for <addr>:<port> failed -01

where the IP address is the firewall and the port number is not a =
standard afs port. Port number also varies all over the map. Time =
correspondence is pretty strong.

At this point I'm guessing that the nat box is dropping the mapping =
between internal and external UDP ports. The 1.5.73 release notes =
mention this issue, saying they add UDP keepalive for just that reason.

Next step would be to have him try 1.5.74, but before he goes that far =
I'd be interested in anyone who's seen similar problems and what if =
anything fixed them.

Steve=