[OpenAFS] Evaluating OpenAFS: Questions

Derrick J Brashear shadow@dementia.org
Wed, 12 Jan 2005 11:31:06 -0500 (EST)


On Wed, 12 Jan 2005 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:

> A quick overview of what I'm looking at: Multiple tera-bytes (5-10?) of
> geospatial data distributed among at least 2, and possibly up to 5 or 6
> different geographical locations.  I need to make this data
> universally/pervasively accessible, in a high-performance, fault-tolerant
> manner.  There will be a layer of web services on top of this, as well as
> possibly a content management system if that proves sensible.

An aside, I think some level of "WebSphere" was essentially this.

> 1) API: Is there an API available to control OAFS related functionnality
> through 3rd party applications? And if so bindings to various languages?
> (I'm especially interested in Python and Java, though if there is a C/C++ I
> may be willing to create my own bindings).  This would be used to
> potentially automate some system tasks such as scheduled or event-based
> replication, user management, etc ...

There is Perl distributed externally, and Java bindings distributed with 
versions 1.3.x.

> 2) User Security:  I understand that the AFS has it's own security
> authentication mechanisms and database.  I also often saw references to
> linking the AFS security into the client's local security (/etc/passwd).

That wouldn't be helpful; Certainly you could continue to list users there 
rather than using a directory service, but the password verification would 
be done with Kerberos (by getting Kerberos tickets, or AFS tokens 
directly) otherwise you'd be unable to speak authenticated to the AFS 
servers.

> What about NIS?  Or other PAM-based authentication?

You can use PAM modules which will do Kerberos and/or AFS verification and 
ticket/token setting. NIS could be used for user information lookup (as a 
directory) but not for authentication.

> I'm wondering about
> integration into the corporate authentication systems, such as the Windows
> domains for example, or the NIS domains.  A given user might have different
> UID's on different boxes and managing the ID match between the local
> password database and the AFS one could quickly get onerous.

numeric uids, or the username string?

> 3) Because of the sheer size of the data (and the fact it will basically
> grow indefinitely), I would like to consider the opion of using replication
> as a form of backup (10TB worth of tapes, and the management overhead for
> all this, will likely proove prohibitive).  I woulod simply make sure all
> data gets replicated to *at least* one other location (Available storage
> permitting).  The web services/application layer would need to be aware of
> the fail overs  in order to make sure the service colsest to the data is
> always used to avoid going over WAN links for dynamic/on-demand data use
> (Which takes me back to questions 1 and 2).

We don't have automatic replication, though your frontend could 
re-replicate after any data store. As to using the closest replica, you 
can do this automatically by setting a priority list on the client called 
"server preferences"; It is by IP address of the server and the client 
will then prefer to fetch from the servers in the order you've specified.

> 4) How safe is the protocol itself? Could I mount shares accross the open
> internet?  Is there encryption available? Basically, if a totally external
> organization wants to "peer" into the filesystem, can it be done safely and
> reliably as far as OFS is concerned (Assuming all other factors, such as
> network bandwith and network level security is properly managed of course).

Encryption is available, but will be more interesting in a few months when 
more encryption types are available. Is your geospatial data sensitive?

The access control issue for sharing out of AFS isn't a big deal, to the 
extent that you trust Kerberos to be secure. If you do have worries about 
that, you may also want the true Kerberos 5 which will be introduced at 
the same time before you share data generally.

And as another aside, I will tell you that you've hit one of my interest 
areas; The largest use of my own home AFS cell is geospatial data.