[OpenAFS] OpenAFS in a production environment
Lester Barrows
barrows@email.arc.nasa.gov
Thu, 1 Sep 2005 19:48:16 -0700
Hi Jeffrey,
On Thursday 01 September 2005 6:43 pm, you wrote:
> There are certainly some performance issues, but they're rather more
> complex than is suggested here. If it were easy, we'd have fixed it by
> now.
Sure, the protocols behind AFS are almost certainly going to be more complex
than my understanding of them. I'm simply sharing my observations of OpenAFS
performance based on our useage over the past several years.
> OpenAFS _clients_ work fine behind a NAT that provides reasonable
> connection tracking and does not time out UDP port associations too
> quickly. For those that do time out such associations quickly, it is
> possible to increase the frequency with which the cache manager polls the
> fileserver, resulting in a "keep-alive" effect, but this has the
> disadvantage of additional load on the network and fileservers.
OpenAFS clients in excess of one system work poorly behind any NAT I've ever
put them behind, be that hardware such as those on Cisco or Foundry routers,
or software such as iptables with the Linux kernel. There may be a few types
of NATs which work properly, and increasing polling frequency may indeed
help, but from an architectural standpoint I wouldn't recommend placing
several AFS clients behind a NAT. It's simply asking for trouble from my
experience, which is the context in which my response was written.
> That said, NAT's break the Internet. Avoid using them if you can.
NATs are a fact of life on the internet today. We try to avoid them where
possible, but the real world isn't perfect and we can't always control the
complete environment. I simply recommend not putting OpenAFS clients behind
them. We should avoid driving cars with petroleum-powered internal combustion
engines since they pollute the air, but somehow it keeps happening. What is
convenient is often chosen over what is perceived to be correct.
> At this point, OpenAFS 1.2 is pretty stale. We did indeed decide not to do
> 2.6 support in that version, but instead focus on the 1.3/1.4 branch, so if
> you want 2.6 support, then you'll need something relatively recent. I
> suggest 1.4.0rc2 (or better, RC3 when that gets released). Really, things
> have been pretty stable for some time now, we've just been trying to squish
> as many bugs as we can before a 1.4 release.
Indeed, but in our environment we do a fair amount of testing and also rely on
confidence from other similar environments before we replace what's already
working. I appreciate all the efforts to fix bugs, hopefully our users will
have a better impression of AFS once 1.4 is released.
> Locally, we are running 1.3.85 or so in production on 2.6 machines on both
> i386 and amd64, and have seen no problems.
Great, once 1.4 is released we'll evaluate it and hopefully it can be used in
our environment. We're currently testing 1.3.87 on i386 and amd64 at the
moment, and will probably end up trying it on PPC in the future. We still
have issues with obtaining tokens from a kaserver on login under amd64, but
those will hopefully be sorted by the time 1.4 rolls out.
> PAG support has been available for quite some time. Yes, if you run an old
> enough version you won't get PAG support. So don't run something that old.
PAGs aren't the issue so much as keeping our kaserver architecture alive until
our full Kerberos V infrastructure is ready to be released. Once I have newer
versions of OpenAFS working in the same manner and reliability that we have
with the 1.2.x versions, we'll be ready to migrate. Either that, or until the
new architecture takes over.
> Frankly, I hate the included backup system. However, there are a number of
> good alternatives available, depending on your environment.
Agreed, but for some organizations it has to suffice. Thanks for the comments
in any case.
Regards,
Lester Barrows