[OpenAFS] Re: fs: server not responding promptly

Andrew Deason adeason@sinenomine.net
Thu, 10 Feb 2011 16:36:28 -0600


On Thu, 10 Feb 2011 15:07:49 -0600
John Tang Boyland <boyland@pabst.cs.uwm.edu> wrote:

> But all evidence is pointing back that this is just a known problem:
> the server will hang on writes to a volume while waiting to break
> callbacks.  But the resulting behavior is very annoying and has bad
> effects (if the hang is long enough there are I/O errors and
> applications start to fail).

Well, to some extent this is unsolvable from the server side. If you
change a file, you must contact the clients that have it cached, or they
will have stale data. And we need to wait a few seconds to contact the
other clients if they are down, to give them a chance to respond. While
Rainer describes something that reduces the upper limit of how bad this
gets, a delay of a few seconds for a client that has dissappeared is
unavoidable.

If you have control over the NATs the clients are behind, fix the NATs.
If you don't but have control over the clients, you can have them run
'fs checks' every minute or so, depending on what the UDP mapping
timeout is on the NAT. Or you can have them run clients in the 1.6
series, which, IIRC, includes more periodic traffic to keep NAT port
mappings alive.

If none of that is feasible, you can block the clients, since they're
behind a broken part of the network. While we don't provide a facility
for blacklisting "bad" clients (but this has been asked for), you can
just do this with a firewall.

> Are we just the only place with a significant number of AFS clients
> behind poorly behaved NAT routers?

You probably have a higher percentage of clients behind NATs, and
certainly behind misbehaving ones, so it's much more visible to you. But
no, you're not the only person to complain about this :)

-- 
Andrew Deason
adeason@sinenomine.net