[OpenAFS] Evaluating OpenAFS: Questions

Jean-Francois.Doyon@CCRS.NRCan.gc.ca Jean-Francois.Doyon@CCRS.NRCan.gc.ca
Wed, 12 Jan 2005 12:18:47 -0500


Derrick,

Great information! Thank you very much.

Personally, I'm biased towards Zope for web application frameworks.  I also
love Python :)
This doesn't worry me too much though, for now at least I'm going to limit
my interest to automating basic OAFS features only, and I can use Perl or
Java for that (Tasks such as registering/adding a new "data provider" to the
system for example, and triggering data replication based on such an event).
Also I suppose that if there are such bindings I could write Python ones
based on that.

Is there an API reference somewhere?

Authentication: I have to admit I'm not up to speed on the details of
authentication.  Here's the end-result I would hope to achieve:  Users can
log into their Windows workstations and map a drive to the distributed
filesystem.  To keep things easy for everyone, they mount this drive through
standard windows methods, which means through SMB.  I would therefore
imagine a server that is AFS aware mounting the AFS and then sharing it back
out as a samba share for example.  This also works nicely to get around
security domain issues.  Problem is how to keep the users synched, if at all
necessary.  There is obviously no need to have a 1:1 equivalency. most users
would probably simply have a readonly type access, that can all be done
under the same user.

I use Windows here, but it could just as well be a Solaris server running
GIS software that needs access to it, and user log into this machine
normally by being authenticated through NIS or something like that.

I suspect this could prove challenging somehow :) I would want to minimize
the user management over head for the corporate IT guys.

The nature of the geospatial data is quite varied.  Some is entirely free,
some isn't (You have to pay for it).  Some could also be potentially
sensitive yes ... At least I can't assume it would never be. Some, although
free, require licensing agreements ... And so on ...

Ah, well I'm glad to hear others have applied this type of tool to
geospatial data! I'd love to hear succes tories in this field specifically.
To get geospatial for a moment: I plan on putting OGC Web Services (At
least) on top of this data, such a WMS, WFS, and so on ... As well as a
registry.
Hopefully I could trigger events such as a registry update on the addition
of data for example.  Or make the feature server aware of it as well.
Because data is distributed and replicated, I'd want said service to always
use the best source possible (You've addressed this .... thanks!).

All in all a very cool solution I think :)  NOw I just have to sell it ....
The large storage requirements might be a killer though ... We'll see.

Thanks again!

Cheers,
J.F.

-----Original Message-----
From: openafs-info-admin@openafs.org
[mailto:openafs-info-admin@openafs.org]On Behalf Of Derrick J Brashear
Sent: January 12, 2005 11:31 AM
To: openafs-info@openafs.org
Subject: Re: [OpenAFS] Evaluating OpenAFS: Questions


On Wed, 12 Jan 2005 Jean-Francois.Doyon@CCRS.NRCan.gc.ca wrote:

> A quick overview of what I'm looking at: Multiple tera-bytes (5-10?) of
> geospatial data distributed among at least 2, and possibly up to 5 or 6
> different geographical locations.  I need to make this data
> universally/pervasively accessible, in a high-performance, fault-tolerant
> manner.  There will be a layer of web services on top of this, as well as
> possibly a content management system if that proves sensible.

An aside, I think some level of "WebSphere" was essentially this.

> 1) API: Is there an API available to control OAFS related functionnality
> through 3rd party applications? And if so bindings to various languages?
> (I'm especially interested in Python and Java, though if there is a C/C++
I
> may be willing to create my own bindings).  This would be used to
> potentially automate some system tasks such as scheduled or event-based
> replication, user management, etc ...

There is Perl distributed externally, and Java bindings distributed with 
versions 1.3.x.

> 2) User Security:  I understand that the AFS has it's own security
> authentication mechanisms and database.  I also often saw references to
> linking the AFS security into the client's local security (/etc/passwd).

That wouldn't be helpful; Certainly you could continue to list users there 
rather than using a directory service, but the password verification would 
be done with Kerberos (by getting Kerberos tickets, or AFS tokens 
directly) otherwise you'd be unable to speak authenticated to the AFS 
servers.

> What about NIS?  Or other PAM-based authentication?

You can use PAM modules which will do Kerberos and/or AFS verification and 
ticket/token setting. NIS could be used for user information lookup (as a 
directory) but not for authentication.

> I'm wondering about
> integration into the corporate authentication systems, such as the Windows
> domains for example, or the NIS domains.  A given user might have
different
> UID's on different boxes and managing the ID match between the local
> password database and the AFS one could quickly get onerous.

numeric uids, or the username string?

> 3) Because of the sheer size of the data (and the fact it will basically
> grow indefinitely), I would like to consider the opion of using
replication
> as a form of backup (10TB worth of tapes, and the management overhead for
> all this, will likely proove prohibitive).  I woulod simply make sure all
> data gets replicated to *at least* one other location (Available storage
> permitting).  The web services/application layer would need to be aware of
> the fail overs  in order to make sure the service colsest to the data is
> always used to avoid going over WAN links for dynamic/on-demand data use
> (Which takes me back to questions 1 and 2).

We don't have automatic replication, though your frontend could 
re-replicate after any data store. As to using the closest replica, you 
can do this automatically by setting a priority list on the client called 
"server preferences"; It is by IP address of the server and the client 
will then prefer to fetch from the servers in the order you've specified.

> 4) How safe is the protocol itself? Could I mount shares accross the open
> internet?  Is there encryption available? Basically, if a totally external
> organization wants to "peer" into the filesystem, can it be done safely
and
> reliably as far as OFS is concerned (Assuming all other factors, such as
> network bandwith and network level security is properly managed of
course).

Encryption is available, but will be more interesting in a few months when 
more encryption types are available. Is your geospatial data sensitive?

The access control issue for sharing out of AFS isn't a big deal, to the 
extent that you trust Kerberos to be secure. If you do have worries about 
that, you may also want the true Kerberos 5 which will be introduced at 
the same time before you share data generally.

And as another aside, I will tell you that you've hit one of my interest 
areas; The largest use of my own home AFS cell is geospatial data.

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info