[OpenAFS] Re: LDAP backend for PTS?

Fri, 20 Nov 2009 15:16:18 -0500

Holger Rauch <holger.rauch@empic.de> had replied, and
Andrew Deason <adeason@sinenomine.net> had kindly clarified
parts already,

> Date:    Fri, 20 Nov 2009 10:12:14 CST
> To:      openafs-info@openafs.org
> Subject: [OpenAFS] Re: LDAP backend for PTS?
> 
> On Fri, 20 Nov 2009 13:30:59 +0100
> Holger Rauch <holger.rauch@empic.de> wrote:
> 
> > > Why do you care how many berkeley DB's you have?
> > 
> > Because I want to store all user/group/authentication related data in
> > a centralized way via OpenLDAP, so that only one Berkeley DB is
> > maintained.
> 
> Yes, but the question is still "why?". Is this because you want one
> place to backup? Or because you want to be able to change all user/group
> information with only LDAP-aware tools? Some other reason?

What he said.

One of the reasons you might value being LDAP-centric is because
then you don't have to use berkeley DB's - you can use other backing
databases, or even complete other implementations of LDAP, completely
transparently.  You're getting hung up on implementation details.
That's "how".

>>> [...]=20
>>> username is easy (uid.  or not.  cn?)
>>> user viceid is easy (uidNumber.)
>>> groupname is easy (cn, probably)
>>> group viceid.  Um.  uidNumber?  Do your groups have numbers?
>>
>>Yes, the groups do have numbers since I'm using the ldapscripts
>>package from Debian Lenny to populate my OpenLDAP DIT with Unix group
>>info and I choose the same UIDs/GIDs also for OpenAFS users/group when
>>creating them via pts commands.

Ok.  Point of clarification here, because it's important.
	AFS groups are not UNIX groups.
	UNIX groups suck.
This is very important.  Unix groups were designed to support
an organization of a few hundred users, and are almost completely
useless even for that.  AFS groups were designed to support
many thousands of users.

I don't believe you are using the same GIDs in AFS that you had
in Unix.  GIDs are positive integers.  AFS group viceids are
negative integers.

Funny you should mention debian.  My debian machines have at least
50 groups provided not just by the system, but by various individual
packages that run and install things.  So I have groups like "daemon",
"sys", "news", "uucp", "kmem" that are important to debian, but certainly
don't belong in AFS.  At least they have numbers that are probably the
same on all my debian machines.  (I have other linux machines that
don't have most of these groups).  I have groups like "messagebus",
"postgress","libuuid","avahi", which are completely dependent on the
software I happen to have installed on each machine, with numbers that
definitely vary between machine.  About the only thing I don't have is
groups that I want in AFS.

One of the things that makes AFS suitable for *enterprise* (as opposed
to merely departmental use) is the ability for users to self-administer
groups.  Here's the key thing: can all of *your* users throw paper wads
at you?  If so, you don't care (as much) about this.  If not, your
users can't rely on you to manage their self-identified group data.
Your users may be quite willing to use ldap to do that instead of pts.

So,	local "installed-package" per-machine groups don't belong in afs.
	Unix groups can't be created or managed by users.

>>
>>> Then there's the intersection data.
>>
>>What's that exactly?
>>
>>> members of groups?
>>
>>That's possible for "regular" POSIX accounts, so why shouldn't it be
>>possible with OpenAFS pts info? Or am I missing something here?

Do you mean objectclass posixGroup and posixAccount?
If so, I am positive you don't want to overload this with
AFS semantics.  See previous rant about Unix groups.

By "intersection data", I meant which users are members of which
groups.  I had tried to illustrate that with ascii art,
but apparently the points of the intersections got lost.

So at my institution, in addition to afs we also run a large
ldap directory.  I'm a member of several groups, which I can see
this way:
	ldapsearch -H ldap://ldap.itd.umich.edu -x \
	member=uid=mdw,ou=People,dc=umich,dc=edu
(well, I see more without -x, but I hope you don't have my tgt.)

These groups were constructed mainly for mail forwarding purposes,
so you won't see any integers that could be used as AFS viceids.
However, you will see 29 of the 33 groups of which I'm a member.
You'll also see some of these are user groups, some are system groups,
and a list of other members of those groups.

In this particular case, I'm searching on "member".  For "ldap the
protocol" there's nothing special about that attribute.  I could search
"owner", and get an equally valid but rather different picture.  From the
ldap search standpoint, "owner" and "member" (or even "errorsTo") are
exactly equivalent.  It's entirely up to the application that handles
the results to make the interpretation, and mail (for umich.edu at least)
intentionally uses more than one interpretation.
So:	Ldap does not have just one kind of "membership".
	It allows you to construct many different notions of "membership".
	Part of the "pt/ldap implementation" problem is defining this.

> 
> > > Also you might want to support "groups within groups".
> > 
> > While it certainly might be useful for some case, it looks like a
> > rather special case to me.
> 
> Well, if you allow supergroups in your ptservers, you're going to need
> to account for this, or something is going to break.

"groups within groups" solved enough "cases" that it was well
worth the implementation effort (at umich.edu).  Of course, it
may not be worth it for your environment (and you may not need
to even account for it if you in fact only care about getcps.)

So, if you issue the ldapsearch command above, one of the groups
you *won't* see there is
cn=Umich-Systems,ou=System Groups,ou=Groups,dc=umich,dc=edu
mail sent to that group nevertheless reaches me, because I'm a
member of cn=Umich-Staff which in turn is a member of cn=Umich-Systems.
As you can see, at umich.edu, "groups within groups" are used for
mail forwarding too.

>> > You don't want to know about gssapi names.
>> Why would they be needed?

Well, you asked.  Mind you, the right way to ask this is to ask Simon
Wilkinson, in Scotland, with large amounts of the finest quality alcohol
at hand.  You are much more likely to have pleasant memories that way.

Anyways, one of the things on the afs roadmap is gssapi.  One commonly
expressed reason to support gssapi is to support non-kerberos-5
authentication mechanisms.  That in turn usually turns out to be based
on some form of pki, which in turn introduces dealing with subject naming
and certificate issuing chains
{	mind you, "x.509 subject names" are "x.500" names.  This
	has nothing to do with ldap.  Pure coincidence.  Move right
	along; don't stop and wave at the inmates.	}
>From the gssapi point, what's important is that the client's identity
may not be human readable, and is returned as an opaque blob that
is never intended to be input (or even seen) by mere mortals.  So, that
means in order to support gssapi in the afs context, there needs to be
a mapping (somewhere) of gssapi identity blobs into authorization
identities, ie, pt entries (or their moral equivalent).  We should
probably also have human readable labels on those identities.
{	and there's an enrollment/provising problem...?	}

And that, in a nutshell, is the pt server / gssapi naming problem.
At least as I understand it (I guarantee you, not enough alcohol
was involved in *my* understanding of this issue.)

Since you haven't proposed using non kerberos-5 gssapi, this is
not an issue for you.  But it *is* a design issue for a *general*
solution that will be attractive to others.

> 
> > > There are more pt rpcs you have to support if you want to provide
> > > read-only query access (pts examine, members, etc.) - and even more
> > > if you want to provide group read/write support as well.
> > 
> > The interesting question is: Why do I have to support the pt rpcs if
> > all I want to do is storing and querying pt data in an LDAP schema?
> 
> I believe Marcus was saying this assuming you were implementing a new
> ptserver implementation that uses LDAP as its backend storage. You
> obviously do not need to implement any of them if you have _both_ the
> 'normal' ptservers running and also have some LDAP server running that
> allows you to query ptserver data.

If your only interest in pt is making fileservers run, then you don't
care about the other rpcs.  If you have *users* who like using the pts
commands, then depending on how much service you want to give those users,
you have more operations to support.
{	at the very least, you probably want "fs listacl" to work.	}

..
>>> 1/ sync ldap into ptserver (Andrew describes a low-tech way;
>>>       higher tech approaches are also possible.)
>> Do you have some link for me where this is described in more detail?

Andrew gave you more in another message.  For the higher
tech approaches, look at the openldap replication protocol,
or read up on "persistent searches", such as here,

http://directory.fedoraproject.org/wiki/Howto:Persistent_search

> 
> > > 2/ ptserver front-end, ldap backend.
> > > 	almost certainly useful to cache
> > 
> > Any more precise hints for this?
> 
> "It's a lot of work". What I believe Marcus is hinting at is basically
> rewriting a large portion of the ptserver backend. This isn't something
> you want to do unless you want to invest a lot of time or money in it.

I wouldn't call it "rewrite".  That implies you'd actually be reusing some
code from ptserver.  The only code of value there is ptint.xg.  So, you'd
be implementing a *new* rpc server, from scratch, implementing whatever
subset of rpcs you care about, with *all new* code.  This is actually less
work than rewriting.  That having been said, if you aren't a programmer
this is the point where you should run away screaming into the night.

As I think more on this, there are 2 subcases,
2a	ptserver front-end, talks to ldap backend database 'natively'.
		(hairy, worst of both worlds, etc.)
2b	ptserver front-end, translates to ldap queries back-end.
		(the more straight-forward case.
		But maybe slower, hence why caching matters.)

> 
> > > 3/ ptserver "as is", ldap front-end into prdb.
> > > 	Openldap provides some dandy frameworks for this.
> > 
> > Haven't yet come accross them. Any links.
> 
> We're not OpenLDAP developers :) (At least, I'm not). I don't have
> information on the plugin APIs, but I would imagine something like this
> would not be a difficult plugin to create.

Start reading here,
http://www.openldap.org/faq/data/cache/1165.html

Also see the source code for openldap.

> 
> > > 4/ use "pag" in kerberos ticket.  Like DCE and MS.
> > 
> > What's "pag"? Haven't heard about this one as of yet.
> 
> I think Marcus is referring to the MS (and DCE, apparently?) practice of
> storing a bunch of authorization data in the kerberos ticket when you
> acquire your ticket, instead of getting that information from a separate
> server (in this case, ptserver). This route would require modifications
> to the KDC, or some way of using Microsoft's PAC data or something along
> those lines.

Sorry, I indeed meant to have said "pac".
DCE did it first; microsoft borrowed the concept.
Yup, you'd have to modify stuff.  Not just the kdc, but also
the fileserver.
Of course, you're probably already using the ldap backend with MIT
kerberos (right?), so this way should be ever so very shiny and glittery.

...

Thank you very much, by the way, for asking this.
In case it wasn't obvious, I'm using your question as a convenient
opportunity to do a brain dump.

				-Marcus Watts