[OpenAFS-devel] fileserver loop

Russ Allbery rra@stanford.edu
Wed, 21 Aug 2002 12:22:16 -0700


Nickolai Zeldovich <kolya@MIT.EDU> writes:

> The symptoms sound really similar to the asymmetric client lossage.
> That bug is fixed in the mainline of OpenAFS (and the cells at MIT have
> been running with this patch for a while now), but it doesn't look like
> it was pulled up to the 1.2.x branch.  The problem comes up when some
> client is able to send packets to the server, but the server is unable
> to send packets back to the client (because of a firewall, or some other
> misconfiguration).  This ties up server's worker threads for a long time
> as the server tries to contact the client.  If the client sends new
> requests sufficiently often (e.g.  the Windows AFS client, whose
> timeouts are much lower than those of the UNIX client), the server runs
> out of worker threads.

Yes, that's *exactly* the situation that we had before with the guy
running ZoneAlarm.  We're starting to deploy the Windows AFS client more
broadly, so the timing with the upgrade to OpenAFS 1.2.6 may be pure
coincidence and the actual problem is more people running Windows AFS with
weird network conditions.

> If you're interested, the deltas on the mainline for this bugfix are:

>   rx-protect-servers-from-half-reachable-clients-20020119
>   rx-cleanup-deadlock-and-refcnt-leak-20020121
>   better-protection-against-asymmetric-clients-20020222
>   minor-rx-lock-cleanup-20020330
>   clear-attachwait-flag-20020403

Cool, thank you!

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>