[OpenAFS] CellServDB file update...

Marcus Watts mdw@umich.edu
Tue, 16 May 2006 01:19:42 -0400


Rodney M Dyer <rmdyer@uncc.edu> and others write:
> From: Derrick J Brashear <shadow@dementia.org>
> To: Rodney M Dyer <rmdyer@uncc.edu>
> Cc: openafs-info@openafs.org
> In-Reply-To: <6.1.2.0.0.20060515231135.01c66ec0@unccmail.uncc.edu>
> Message-ID: <Pine.GSO.4.61-042.0605152335430.10923@johnstown.andrew.cmu.edu>
> References: <6.1.2.0.0.20060515223234.01c8bec0@unccmail.uncc.edu>
>  <Pine.GSO.4.61-042.0605152255590.10923@johnstown.andrew.cmu.edu>
>  <6.1.2.0.0.20060515231135.01c66ec0@unccmail.uncc.edu>
> Subject: Re: [OpenAFS] CellServDB file update...
> Sender: openafs-info-admin@openafs.org
> Errors-To: openafs-info-admin@openafs.org
> Date: Mon, 15 May 2006 23:37:38 -0400 (EDT)
> 
> On Mon, 15 May 2006, Rodney M Dyer wrote:
> 
> > At 10:57 PM 5/15/2006, Derrick J Brashear wrote:
> >> Yup. There was a bug. It should have been fixed, but a deadlock was 
> >> introduced (by me) in the afsconf package when CellServDB is reread. Broken 
> >> 1.2.4 or so, fixed 1.2.11 or so, iirc.
> >
> > Since it appears that the operation of copying a new CellServDB over an 
> > existing one, while a file server is in operation, is "undefined", does your 
> > "bug" fit into that "undefined" category?
> 
> yes
> 
> >> It will be reread during normal operation.
> >
> > Describe "normal".  ;)
> 
> your fileserver is talking to your clients
> you copy in a new CellServDB
> your fileserver becomes less interested in talking to clients and goes off 
> into a kcorner and talks to itself a lot
> 
> > Did you know your bug can bring down an entire cell, requiring all the 
> > clients to reauthenticate?
> 
> it cannot, actually. it can cause every server to need to be restarted
> bjut the tokens in the client cache will still work after the new servers 
> start
> 
> unless you have a dumb client which is "clever" and throws away its token
> 
> but that's not the server;'s faulty
> 
> also, i should pribably answer when i am more sober

Uh, when you're more sober -- why exactly do you *need*
to change the CellServDB on your fileservers?  This is
definitely not a "normal operational thing" to do in the
first place.

The fileserver (& friends) are server-side things.
The only CellServDB they need is
	/etc/openafs/etc/server/CellServDB
	/usr/afs/etc/CellServDB
(depending on if you use transarc paths).  The only cell that should be
listed here is the cell the server is actually in.  Listing any other
cells is "harmless" but not useful.  So, the only reason you *should*
be changing this is because you are adding or removing db hosts to
CellServDB.  Since that's got widespread client-side implications,
well, at the very least you certainly shouldn't expect that to be
completely invisible to them at least not without careful planning and
thought.  Adding or removing things here also has serious implications
for the fileserver - at the very least, it needs to redo its ptserver
connections.

It's certainly nice if the software does something "right"
automatically when the server side CellServDB get changed.  It sounds
like Derrick did that, modulo a minor bug or so.  It would also be nice
if the documentation at least described what was going to happen if it
isn't going to be "nice" behavior.  Sounds like the documentation at
least managed to identify that this was risky, even if it wasn't very
clear about why this was a problem.  At that point, the onus would seem
to be on the AFS administrators to try this out in advance in a test
environment and see what was going to happen, before trying it for real
and risking breaking things.

Could you have changed other things like your KeyFile or ThisCell?  That
would certainly result in tossing tokens.

				-Marcus