[OpenAFS] Evaluating OpenAFS: Questions
Derrick J Brashear
shadow@dementia.org
Wed, 12 Jan 2005 11:31:06 -0500 (EST)
On Wed, 12 Jan 2005 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:
> A quick overview of what I'm looking at: Multiple tera-bytes (5-10?) of
> geospatial data distributed among at least 2, and possibly up to 5 or 6
> different geographical locations. I need to make this data
> universally/pervasively accessible, in a high-performance, fault-tolerant
> manner. There will be a layer of web services on top of this, as well as
> possibly a content management system if that proves sensible.
An aside, I think some level of "WebSphere" was essentially this.
> 1) API: Is there an API available to control OAFS related functionnality
> through 3rd party applications? And if so bindings to various languages?
> (I'm especially interested in Python and Java, though if there is a C/C++ I
> may be willing to create my own bindings). This would be used to
> potentially automate some system tasks such as scheduled or event-based
> replication, user management, etc ...
There is Perl distributed externally, and Java bindings distributed with
versions 1.3.x.
> 2) User Security: I understand that the AFS has it's own security
> authentication mechanisms and database. I also often saw references to
> linking the AFS security into the client's local security (/etc/passwd).
That wouldn't be helpful; Certainly you could continue to list users there
rather than using a directory service, but the password verification would
be done with Kerberos (by getting Kerberos tickets, or AFS tokens
directly) otherwise you'd be unable to speak authenticated to the AFS
servers.
> What about NIS? Or other PAM-based authentication?
You can use PAM modules which will do Kerberos and/or AFS verification and
ticket/token setting. NIS could be used for user information lookup (as a
directory) but not for authentication.
> I'm wondering about
> integration into the corporate authentication systems, such as the Windows
> domains for example, or the NIS domains. A given user might have different
> UID's on different boxes and managing the ID match between the local
> password database and the AFS one could quickly get onerous.
numeric uids, or the username string?
> 3) Because of the sheer size of the data (and the fact it will basically
> grow indefinitely), I would like to consider the opion of using replication
> as a form of backup (10TB worth of tapes, and the management overhead for
> all this, will likely proove prohibitive). I woulod simply make sure all
> data gets replicated to *at least* one other location (Available storage
> permitting). The web services/application layer would need to be aware of
> the fail overs in order to make sure the service colsest to the data is
> always used to avoid going over WAN links for dynamic/on-demand data use
> (Which takes me back to questions 1 and 2).
We don't have automatic replication, though your frontend could
re-replicate after any data store. As to using the closest replica, you
can do this automatically by setting a priority list on the client called
"server preferences"; It is by IP address of the server and the client
will then prefer to fetch from the servers in the order you've specified.
> 4) How safe is the protocol itself? Could I mount shares accross the open
> internet? Is there encryption available? Basically, if a totally external
> organization wants to "peer" into the filesystem, can it be done safely and
> reliably as far as OFS is concerned (Assuming all other factors, such as
> network bandwith and network level security is properly managed of course).
Encryption is available, but will be more interesting in a few months when
more encryption types are available. Is your geospatial data sensitive?
The access control issue for sharing out of AFS isn't a big deal, to the
extent that you trust Kerberos to be secure. If you do have worries about
that, you may also want the true Kerberos 5 which will be introduced at
the same time before you share data generally.
And as another aside, I will tell you that you've hit one of my interest
areas; The largest use of my own home AFS cell is geospatial data.