[OpenAFS] Doubts about OpenAFS implementation in a company

Stanisław Kamiński stasheck.fora@gmail.com
Wed, 18 May 2011 12:05:15 +0200


First of all, hi to everyone - it's my first own topic here :-)

I'm working for a company ~1000 ppl, three offices in Poland and three 
other in bordering countries. OpenAFS was introduced about 6 years ago, 
when the company was quite a bit smaller, and the guy that did this left 
no documentation and some of his design decision are making me scratch 
my head - that's part of the reason I'm writing this.

Other things that are important:
- about 2/3 of users work on Linux (CentOS) workstations, and their 
homedirs are served from AFS
- 1/3 are Windows users
- Polish offices are connected using at least 10 Mbit symmetric links, 
but the offices abroad might have much less. In one particular example, 
the link is assymmetric 10/1 Mbit (d/u)
- there is single AFS cell covering all the offices
- every office has it's own db and fileserver (Debian 5/6)
- we rely on our partner to assign IP address space for us - net result 
is that the weakest link location (10/1) has the lowest IP and there 
_nothing_ we can do about it

The last thing causes Ubik elections to constantly choose the server 
located on the weakest link as sync site.

Also, we quite often have to move user volumes between different offices 
- we've got quite a bit of rotation between them, say some 10-20 ppl per 
week.

Now, I've been assigned to improve AFS performance in any way possible. 
It was very bad, then I changed server parameters to tune it to "large" 
server options - that yield enormous speedup, but I still believe I can 
get much more from the system.

There are two things that are, ahem, not as fast as one would like. The 
worse one is directory traversal - moving between levels of directories 
can take 5-10 seconds (on a workstation with 1 Gbit link to AFS server 
in its location). The other one is the upload/download speed itself - 
last time I measured, windows client d/u was 2/5 MB/s - I think I can 
get more than that.

As I'm currently making my way through "Managing AFS" by Richard 
Campbell, I'm not yet fully up-to-speed on OpenAFS inner workings and 
such. Right now I only want to ask: is the design of our AFS system 
correct? Or did the guy introducing it made some short-sighted 
projections which don't hold water in current environment (as 
described). I'm talking here about single-cell design - although I'm not 
sure it's easy to move volumes between different cells.

Other thing I'm worried about: can it be that having the sync site on 
slowest uplink causes everything to slow down? Is there any way to get 
some measurements for this?

Thanks for reading all of this and not falling asleep :-) And waiting 
for you comments,
Stan