[OpenAFS] idle dead timeout issue?

Russ Allbery rra@stanford.edu
Wed, 04 Apr 2012 11:24:21 -0700


Jack Neely <jjneely@pams.ncsu.edu> writes:

> I've grabbed 1.6.1, read the release notes and saw some notes that
> probably apply to this situation.  I'm still unclear if the OpenAFS
> folks believe this issue is solved or just better.  In any case there's
> nothing like tossing it on one of the web servers and giving her a spin.

> Performance appears better compared to our other web servers, slightly.
> However, we are still getting periods of time where AFS takes multiple
> seconds to 30 seconds to respond.  Then suddenly, all hanging AFS
> transactions return at the same time.

1.6.1 should be quite a bit better.  However, I'm fairly sure that all of
the performance problems causing this behavior with web servers are not
fixed, and that further work is required and will be required even with a
1.6.1 server.  I just haven't seen anyone find and fix something
substantial enough to seem likely to be the cause of the problems we've
been seeing.

We're still tracking similar performance issues here, although we've not
yet started deploying 1.6.1 (we're just getting ready to do that).

One thing that definitely helps is having more file servers.  When we
temporarily reduced our count of file servers by half, we were in misery,
even though the systems were generally not heavily loaded.  Now that we're
back up to a full server pool, we're seeing occasional problems, but
they're much rarer and not as much of a full panic situation.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>