[OpenAFS-devel] Re: [OpenAFS] Volume root corruptions - anybody seen those?

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 09 Jun 2008 11:15:00 -0400


--On Friday, June 06, 2008 10:10:22 AM +0200 Rainer Toebbicke 
<rtb@pclella.cern.ch> wrote:

> Hartmut and I discussed this over the phone and disagreed on whether this
> "synchronization" is actually the fileserver's role.
> He thinks it is, and fairly enough the existing code agrees with him.
> His patch is so small that I'll shut up and go straight and deploy it.
>
> Only for the record and on the grounds that even code existing before
> 1993 (lol just checked it in afs3.2) should not be immune from
> questioning for proper design, I'll point out nevertheless:
>
> . the fileserver has a hell of a job already and loading it with yet
> another function will hardly make it slimmer

I think the fssync server is the proper place for this synchronization.  I 
would be open to arguments about whether it is appropriate for the fssync 
server to be embedded in the fileserver, as opposed to some other process. 
The current model almost certainly is a result of evolution from the time 
when the fileserver was the only process touching volumes (at least while 
it is up), to one in which there were various external volume utilities but 
only the fileserver was running all the time, to the current situation 
where there is a long-running fileserver and volserver.


> . the fileserver is actually not concerned managing volumes, certainly
> not all volumes which float around on the disks. Logically, it would then
> also have to manage synchronization of volumes it never cares about
> itself, such as the various temporary clones, R/W volids when the only
> one in sight is a R/O, etc.

Currently that is the case, more or less.  I don't believe the fileserver 
ever finds out about temporary clones; the volserver never "puts them back" 
via fssync like it does to newly created or restored volumes, and no 
exclusion is needed becausef every new temporary clone gets a new volume 
ID, assigned by the vldb coordinator.


> . sooner or later a volume id will stay on the OfflineVolumes list for
> obscure reasons, you have no way to find out, requests for it will be
> denied and you'll scratch your head for a while. Then you'll restart the
> file server...

This shouldn't be able to happen, unless there is a fairly blatant bug in 
the fssync server.  Clients of the fssync service have stream-oriented 
connections, and when a connection goes away, the fssync server clears the 
offline list for that client.  This is in fact something I was worried 
about with Hartmut's patch, but I think it's OK.


-- Jeff