[OpenAFS] Shared r/w access to numerous sqlite databases: an appropriate application for AFS?

Thomas Kula kula@tproa.net
Thu, 8 Apr 2010 16:38:16 -0400


On Thu, Apr 08, 2010 at 04:06:18PM -0400, Brandon Simmons wrote:
> 
> Thanks for the response. It seems like whole-file locking in sqlite
> would be a good choice for me in any case, and I can't imagine needing
> that kind of writing concurrency.
> 
> Doing a little more research, this message describes a few more issues
> with sqlite over NFS which I suppose might apply to AFS:
> 
> http://old.nabble.com/SQLite-on-NFS-cache-coherency-td15655701.html
> 
> In a situation where the whole-file locking scheme is used, would AFS
> be an acceptable choice? Would it be better than NFS?
> 
> For instance I envision a handful of clients on different machines
> each writing to a single sqlite DB every few seconds; would this
> defeat AFS's caching scheme?
> 

Basically, every time that sqlite db file is changed the fileserver
will have to notify all of the clients that have callbacks on 
that file, and then all the clients will have to go fetch that
file again (or the chunk that changed, maybe, I can never remember
that detail). It does kinda defeat caching, whether or not you
still get good enough performance for it to still be useful is 
unknown.

This kind of thing does make me twitchy, since it is starting to
sound close to something that happens around here: every so often
someone (usually a student) re-invents this notion of finding a
lab full of unused machines, logging on to all of them and turning
AFS into a rather horrid and ineffective message passing interface.
Now, they usually do this at a higher rate than you are anticipating
(dozens or hundreds of file creations/modifications/deletions per
second --- the really fun ones renew their credentials every time
as well, to the point where we've got a local unit of measurement
named after a user who did a rather impressive level of this...),
and at that level, with all the callbacks being broken on a large
number of vnodes from a large number of clients at a high rate 
usually makes that particular fileserver unhappy.

So, the question of concurrency and if sqlite will do the right 
thing aside, you may want to try a good 200% expected load from
some number of clients on a volume on a fileserver you don't
particularly care about first and make sure you aren't going to
make things unhappy for yourself.

You may also want to make sure that other common AFS operations
don't cause problems. The only one that springs to mind off hand
is the brief period when a volume is backed up that the volume
is locked so the backup clone an be created. There may be other
examples that other folks can think of. Of course, your application 
should be making sure it handles these kinds of things already
anyways, because even local disk likes to be goofy on occasion.

I love sqlite, but I don't use sqlite databases that are located
in AFS outside of a single user (me) from a single machine (my
workstation) and a single process that happen to be looking at a
db file that just happens to be in AFS because, well, then I'll 
know where that file is and know it's getting backed up. Your 
milage may vary, offer void except when it isn't, etc. etc.


-- 
Thomas L. Kula | kula@tproa.net | http://kula.tproa.net/