[OpenAFS] converting Kaserver and protection server to working with LDAP

Thu, 07 Jun 2001 20:48:35 -0400

Jeffrey Hutzelman <jhutz@cmu.edu> sent:
> On Mon, 4 Jun 2001, Marcus Watts wrote:
> 
> > I was wrong in the above in that the cache manager itself doesn't
> > talk PT.  Lucky you.  However, the cache manager does know about viceIDs and
> > ACLs, expects there to be a 1-1 mapping between AFS viceIDs and Unix
> > UIDs, and expects ACLs to contain a list of viceIDS (which are actually
> > manipulated using the "fs sa" and "fs la" commands.)  Breaking this means
> > you won't be compatible with vanilla AFS (interoperability may be
> > a problem.)
> 
> While there are potential problems, these really aren't...
> 
> - The cache manager knows nothing about the relationship between vice ID's
>   and UNIX UID's.  It sometimes tracks credentials by UNIX UID (when there
>   is no pag), but that is the extent of its knowledge of UNIX UID's.

ViceID vs. uid
	stat call
	SUID programs
		file src/VNOPS/afs_vnop_attrs.c
		function afs_CopyOutAttrs
			attrs->va_uid = avc->m.Owner;
	Both of these have an implicit ViceIDs -> uid mapping.
	I don't think there's anything in the cache manager that
	goes the other way.
ACLs:
	pioctl interface
		file src/VNOPS/afs_pioctl.c
		function PSetAcl
			calls RXAFS_StoreACL
				which expects to be given a string
				containing VICEids and other stuff.
		function PGetAcl
			calls RXAFS_FetchACL
				which returns a string containing
				VICEids and other stuff.
> 
> - The cache manager never sees vice ID's on ACL's.  In fact, the only time
>   it sees ACL's at all is when you manipulate them via the 'fs' commands,
>   which do all their work via cache manager calls.  Even then, the ACL's
>   the cache manager sees are those exported by the fileserver's FetchACL
>   and StoreACL calls, which use _names_, not vice ID's.

Yup, you're quite right, the ACL list in the RXAFS interface does
contain pt names, and the cache manager doesn't actually interpret the
opaque string (beyond checking the length).  (Hm; ptserver only checks
for newlines not tabs in pt names...)  Both the fileserver and the "fs"
command map between names and ViceIDs using pt.  (Which is in some ways
both terribly elegant and kind of ugly.)  I would expect odd results
and possible complaints if doing an "fs la" on someone else's cell
didn't work--(or might even be one's own cell...) I suppose it's a
matter of interpretation whether this failure counts as
"interoperates".

If using ldap & storing DNs and not a short identifier in ACLs (and of
course assuming you still supported ACLs), you'd probably need a
different mechanism to pass the much longer strings that would result,
in addition to the obvious changes to the file server & "fs" commands.

> 
> Actually, it can be worse than bad.  Years ago, one of the failure modes
> of AFS was commonly known as a "ptserver meltdown".  This occurred when
> a ptserver problem caused a backlog of GetCPS requests from fileservers
> that would essentially grow without bound.  The result would be that the
> ptservers would become and _stay_ heavily overloaded, and fileservers
> would start to hang waiting for responses.
> 
> There have been some changes since then that make this scenario less
> likely, but it is still the case that the ptserver is probably the most
> heavily loaded of the AFS database services, and the majority of the
> requests it services are expensive ones.  Think very carefully before
> replacing it. 

Been there; done that.  The ptserver meltdown problems were largely
ubik problems and are mostly fixed.

There's one other nasty bit of the database servers, and that's the
backup database.  Fortunately, it isn't actually visible to the client,
unless load from that slows something else down (like ptserver).  It
turns out it has a really small hash bucket size, so the various tables
quickly get really big hash chains.  Some of the database utilities
(like deleting a dump tape) crawl up those hash chains lots of times as
they process all the volume records, so that eats up loads of CPU &
disk.  That's not much of a problem for a small cell.  For us it was
annoying enough we long ago gave up running the backup server on the
real database servers; today we run a udp datagram redirector that
sends things on to the actual sadly overloaded machine with the actual
backup database.  That way it only affects backup, not everyone's
workstation.  Of course this doesn't have anything to do with
uid<->viceid or ldap.

				-Marcus Watts
				UM ITCS Umich Systems Group