[OpenAFS] Evaluating OpenAFS: Questions

Jean-Francois.Doyon@CCRS.NRCan.gc.ca Jean-Francois.Doyon@CCRS.NRCan.gc.ca
Wed, 12 Jan 2005 11:18:45 -0500


Hello,

For a project I am working on, I am evaluating the use of a distributed
filesystem as a potential solution to my problems.

So far I've discovered RedHat's GFS, Coda, and OpenAFS.  I've started
reading up on OpenAFS and I must say I am quite impressed: The product =
looks
mature and featureful.

A quick overview of what I'm looking at: Multiple tera-bytes (5-10?) of
geospatial data distributed among at least 2, and possibly up to 5 or 6
different geographical locations.  I need to make this data
universally/pervasively accessible, in a high-performance, =
fault-tolerant
manner.  There will be a layer of web services on top of this, as well =
as
possibly a content management system if that proves sensible.

I was wondering however if someone could answer some high-level =
questions
about the product?

1) API: Is there an API available to control OAFS related =
functionnality
through 3rd party applications? And if so bindings to various =
languages?
(I'm especially interested in Python and Java, though if there is a =
C/C++ I
may be willing to create my own bindings).  This would be used to
potentially automate some system tasks such as scheduled or event-based
replication, user management, etc ...

1a) If there is no API, are the command line tools safe to call through =
a
system call?  Are the various return codes well documented and so on, =
so
that I may use them integrated into an application?

2) User Security:  I understand that the AFS has it's own security
authentication mechanisms and database.  I also often saw references to
linking the AFS security into the client's local security =
(/etc/passwd).
What about NIS?  Or other PAM-based authentication?  I'm wondering =
about
integration into the corporate authentication systems, such as the =
Windows
domains for example, or the NIS domains.  A given user might have =
different
UID's on different boxes and managing the ID match between the local
password database and the AFS one could quickly get onerous.

3) Because of the sheer size of the data (and the fact it will =
basically
grow indefinitely), I would like to consider the opion of using =
replication
as a form of backup (10TB worth of tapes, and the management overhead =
for
all this, will likely proove prohibitive).  I woulod simply make sure =
all
data gets replicated to *at least* one other location (Available =
storage
permitting).  The web services/application layer would need to be aware =
of
the fail overs  in order to make sure the service colsest to the data =
is
always used to avoid going over WAN links for dynamic/on-demand data =
use
(Which takes me back to questions 1 and 2).

4) How safe is the protocol itself? Could I mount shares accross the =
open
internet?  Is there encryption available? Basically, if a totally =
external
organization wants to "peer" into the filesystem, can it be done safely =
and
reliably as far as OFS is concerned (Assuming all other factors, such =
as
network bandwith and network level security is properly managed of =
course).

I think that's about it for now ...

Thanks in advance!

Jean-Fran=E7ois Doyon
Internet Service Development and Systems Support / Sp=E9cialiste de
d=E8veloppements internet et soutien technique
Canada Centre for Remote Sensing / Centre Canadien de =
t=E9l=E9d=E9tection
Natural Resources Canada / Ressources naturelles Canada
http://atlas.gc.ca
Tel./T=E9l. : (613) 992-4902
Fax: (613) 947-2410