[OpenAFS] Databases & AFS (revisited)

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 18 Jan 2007 00:03:09 -0500


On Saturday, December 23, 2006 06:14:32 PM +0100 Davor Ocelic 
<docelic@mail.inet.hr> wrote:

> Looking at [2], which appears to be CMU's class assignment, the
> students are supposed to create a Postgres database within their
> AFS volumes, without a word of problems that might create.

A bit delayed, but...

That document is over 3 years old; AFAIK it does not represent a "current" 
assignment for any class.  It represents one assignment for one class, 
developed by the faculty teaching that class.  It should certainly not be 
taken as CMU's position on whether putting database files in AFS is a good 
idea.

Some applications, including database servers, use byte-range locking. 
Depending on your platform, byte-range locks may be handled locally but 
turned into whole-file locks on the server, handled locally but not 
reflected on the server at all, or they may be completely ignored.  UNIX 
applications which depend on working byte-range locks will generally not 
work when the same file is used by multiple AFS client systems at the same 
time; however, many of them will work fine if all programs using the file 
are on the _same_ AFS client, or if there is only one such program at a 
time.

Even without the potential locking problems and performance penalties, 
running a database server or other long-running service backed by data 
stored in AFS (or any non-local filesystem) is fraught with peril.  Such a 
service, running on a perfectly working machine, can unexpectedly lose 
access to its data due to network problems, a fileserver outage, or even 
simple things like loss of tokens.  This is not something I would recommend 
for a production service.


However, short-term, light-duty uses like the postgres assignment you 
mentioned will probably be OK.  In these situations, the user is running 
the database server using his own tokens, the database files are not 
accessed by anything else, and the server only runs as long as the user is 
logged in (in fact, the "servers" mentioned in this assignment are actually 
not servers at all, but public timesharing systems -- the users have only 
ordinary unprivileged access, and the machines reboot every night).  Since 
the database does not contain any critical data, a network or fileserver 
outage creates an inconvenience but no serious data loss.


-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA