[OpenAFS] Re: Web server over AFS

Fri, 15 Dec 2000 14:02:18 -0500

Peter Scott <Peter.J.Scott@jpl.nasa.gov> writes:
> Message-Id: <4.3.2.7.2.20001214184412.00b22d40@psdt.com>
> Date: Thu, 14 Dec 2000 18:54:02 -0800
> To: openafs-info@openafs.org, info-afs@transarc.com
> From: Peter Scott <Peter.J.Scott@jpl.nasa.gov>
> Subject: Web server over AFS
> 
> Hello.  I am looking for input from anyone who has successfully run an 
> institutional web server using AFS and provided users the ability to create 
> web interfaces to persistent data.  The hard part is ensuring that customer 
> CGI programs are the only processes allowed to modify that data and that 
> no-one else with an AFS account can get at it through CGI programs in their 
> own directory.  Non-AFS systems solve this with setuid, which is not 
> available on AFS.  I am talking about a centralized web server which is 
> administered by AFS admins, no user access allowed to local disk space.
> 
> We have a couple of theoretical solutions already, so I really want to 
> constrain the answers to solutions that have been proven in 
> practice.  Bonus points if yours works with load balancing or round-robin 
> type web server multiplexing.  Thanks in advance.
> --
> Peter Scott
> Peter.J.Scott@jpl.nasa.gov
> 
> 

SUID doesn't exist in AFS, because the trust model doesn't exist.  SUID
works under Unix because the OS can be trusted.  With AFS, a user could
boot a rogue OS that tweaks things, or halt the OS at any time and
munge things, so the fileserver can't trust that the program running on
the client machine is really working the way intended by the owner of
the file from which the program was loaded.  Even in straight Unix,
SUID gets turned off by ptrace for similar reasons.

If you are storing data in AFS, then whatever accesses that data has to
have a token (ticket for AFS) that permits that access.  There are a
couple of ways to arrange for this.

The straight-forward way would be to arrange for the CGI script to do
an AFS setpag, klog, and unlog.  In order to do the klog, it will need
to have a srvtab.  The srvtab will function in much the same way that a
password would work for a human.  The AFS klog program won't work for
"klog" here - you need a special program to do this.  We (umich.edu)
use gettoken, a copy of which can be found in
	/afs/umich.edu/group/itd/build/mdw/gettoken for this purpose.
(This isn't packaged up nicely; use at own risk,&etc&etc).  If your cgi
script were a C program, you'd probably want to hard-wire the
equivalent logic into your cgi...

There are two main issues with above cgi logic:
	(1) the cgi script has to have access to the srvtab.
		With the typical apache httpd setup, this could
		be done one of two ways:
		(a)	permit the srvtab read-everybody; *any* cgi
			script or other program has potential access to
			the srvtab.  This works best when the web
			server has tightly controlled access, dedicated
			functionality, so the functionality of all
			cgi scripts can be trusted.
		(b)	the cgi script is SUID, to an owner that has
			access to the srvtab.
	(2) each run of the cgi script has to authenticate
		all over again.  For a server with a low to medium
		volume of traffic, this may be fine.  For a server
		with a large volume of traffic, there may be various
		issues, including the "only 1 new pag per second" logic
		in the cache manager.

Another way to package things up is that the cgi script runs, and talks
to a separate server daemon that does the actual work.  This is a more
complicated approach to setup, but has some advantages: since the
daemon can be persistant, it can remember things between calls (careful
how you use this.)  More importantly; the daemon can maintain a persistant
AFS token shared among multiple calls.

There are lots of ways the daemon could work; it could run on the same
machine and use Unix domain sockets (if all users on that machine can be
trusted), or it could run on a different machine and require that
a kerberos ticket be used to access it.  If run on a different machine,
it can enforce its own trust rules without concern that the web server
might be compromised.  On the other hand, the web server still needs
to do some form of "proxy" authentication; it needs to say "Hi, I'm
bill-the-web-server@X, working on behalf of andrew-the-user@X" to
the daemon.  Since there are no pags or AFS file server connections,
using kerberos to authenticate to the daemon is still nicer
than getting an AFS token straight off.

There are lots of other ways to set things up, but I think this covers
a good range of possibilities.  One other possibility, for instance,
is to teach apache about pags, and using the same pag across
multiple cgi script runs, doing an "unlog" between each cgi script.
It's possible to do this with an apache loadable module, so this
isn't as hard as it might sound.  I even have source code, somewhere,
that does this, which I last built with apache 1.3.12.

			-Marcus Watts
			UM ITCS Umich Systems Group