[OpenAFS] OpenAFS in a production environment

Lester Barrows barrows@email.arc.nasa.gov
Thu, 1 Sep 2005 19:48:16 -0700

Hi Jeffrey,

On Thursday 01 September 2005 6:43 pm, you wrote:
> There are certainly some performance issues, but they're rather more
> complex than is suggested here.  If it were easy, we'd have fixed it by
> now.

Sure, the protocols behind AFS are almost certainly going to be more complex 
than my understanding of them. I'm simply sharing my observations of OpenAFS 
performance based on our useage over the past several years.

> OpenAFS _clients_ work fine behind a NAT that provides reasonable
> connection tracking and does not time out UDP port associations too
> quickly.  For those that do time out such associations quickly, it is
> possible to increase the frequency with which the cache manager polls the
> fileserver, resulting in a "keep-alive" effect, but this has the
> disadvantage of additional load on the network and fileservers.

OpenAFS clients in excess of one system work poorly behind any NAT I've ever 
put them behind, be that hardware such as those on Cisco or Foundry routers, 
or software such as iptables with the Linux kernel. There may be a few types 
of NATs which work properly, and increasing polling frequency may indeed 
help, but from an architectural standpoint I wouldn't recommend placing 
several AFS clients behind a NAT. It's simply asking for trouble from my 
experience, which is the context in which my response was written.

> That said, NAT's break the Internet.  Avoid using them if you can.

NATs are a fact of life on the internet today. We try to avoid them where 
possible, but the real world isn't perfect and we can't always control the 
complete environment. I simply recommend not putting OpenAFS clients behind 
them. We should avoid driving cars with petroleum-powered internal combustion 
engines since they pollute the air, but somehow it keeps happening. What is 
convenient is often chosen over what is perceived to be correct.

> At this point, OpenAFS 1.2 is pretty stale.  We did indeed decide not to do
> 2.6 support in that version, but instead focus on the 1.3/1.4 branch, so if
> you want 2.6 support, then you'll need something relatively recent.  I
> suggest 1.4.0rc2 (or better, RC3 when that gets released).  Really, things
> have been pretty stable for some time now, we've just been trying to squish
> as many bugs as we can before a 1.4 release.

Indeed, but in our environment we do a fair amount of testing and also rely on 
confidence from other similar environments before we replace what's already 
working. I appreciate all the efforts to fix bugs, hopefully our users will 
have a better impression of AFS once 1.4 is released.

> Locally, we are running 1.3.85 or so in production on 2.6 machines on both
> i386 and amd64, and have seen no problems.

Great, once 1.4 is released we'll evaluate it and hopefully it can be used in 
our environment. We're currently testing 1.3.87 on i386 and amd64 at the 
moment, and will probably end up trying it on PPC in the future. We still 
have issues with obtaining tokens from a kaserver on login under amd64, but 
those will hopefully be sorted by the time 1.4 rolls out.

> PAG support has been available for quite some time.  Yes, if you run an old
> enough version you won't get PAG support.  So don't run something that old.

PAGs aren't the issue so much as keeping our kaserver architecture alive until 
our full Kerberos V infrastructure is ready to be released. Once I have newer 
versions of OpenAFS working in the same manner and reliability that we have 
with the 1.2.x versions, we'll be ready to migrate. Either that, or until the 
new architecture takes over.

> Frankly, I hate the included backup system.  However, there are a number of
> good alternatives available, depending on your environment.

Agreed, but for some organizations it has to suffice. Thanks for the comments 
in any case.

Lester Barrows