[OpenAFS] Doubts about OpenAFS implementation in a company
Stanisław Kamiński
stasheck.fora@gmail.com
Wed, 18 May 2011 12:05:15 +0200
First of all, hi to everyone - it's my first own topic here :-)
I'm working for a company ~1000 ppl, three offices in Poland and three
other in bordering countries. OpenAFS was introduced about 6 years ago,
when the company was quite a bit smaller, and the guy that did this left
no documentation and some of his design decision are making me scratch
my head - that's part of the reason I'm writing this.
Other things that are important:
- about 2/3 of users work on Linux (CentOS) workstations, and their
homedirs are served from AFS
- 1/3 are Windows users
- Polish offices are connected using at least 10 Mbit symmetric links,
but the offices abroad might have much less. In one particular example,
the link is assymmetric 10/1 Mbit (d/u)
- there is single AFS cell covering all the offices
- every office has it's own db and fileserver (Debian 5/6)
- we rely on our partner to assign IP address space for us - net result
is that the weakest link location (10/1) has the lowest IP and there
_nothing_ we can do about it
The last thing causes Ubik elections to constantly choose the server
located on the weakest link as sync site.
Also, we quite often have to move user volumes between different offices
- we've got quite a bit of rotation between them, say some 10-20 ppl per
week.
Now, I've been assigned to improve AFS performance in any way possible.
It was very bad, then I changed server parameters to tune it to "large"
server options - that yield enormous speedup, but I still believe I can
get much more from the system.
There are two things that are, ahem, not as fast as one would like. The
worse one is directory traversal - moving between levels of directories
can take 5-10 seconds (on a workstation with 1 Gbit link to AFS server
in its location). The other one is the upload/download speed itself -
last time I measured, windows client d/u was 2/5 MB/s - I think I can
get more than that.
As I'm currently making my way through "Managing AFS" by Richard
Campbell, I'm not yet fully up-to-speed on OpenAFS inner workings and
such. Right now I only want to ask: is the design of our AFS system
correct? Or did the guy introducing it made some short-sighted
projections which don't hold water in current environment (as
described). I'm talking here about single-cell design - although I'm not
sure it's easy to move volumes between different cells.
Other thing I'm worried about: can it be that having the sync site on
slowest uplink causes everything to slow down? Is there any way to get
some measurements for this?
Thanks for reading all of this and not falling asleep :-) And waiting
for you comments,
Stan