[OpenAFS-devel] Re: [OpenAFS] Volume root corruptions - anybody
seen those?
Rainer Toebbicke
rtb@pclella.cern.ch
Fri, 6 Jun 2008 10:10:22 +0200
Jeffrey Hutzelman schrieb:
> anyway. Hartmut's patch is on the right track -- the appropriate thing
> to do here is for the fssync service to provide the same protections for
> multiple fssync clients accessing the same volume that it does for a
> single client and the fileserver itself.
>
Hartmut and I discussed this over the phone and disagreed on whether
this "synchronization" is actually the fileserver's role.
He thinks it is, and fairly enough the existing code agrees with him.
His patch is so small that I'll shut up and go straight and deploy it.
Only for the record and on the grounds that even code existing before
1993 (lol just checked it in afs3.2) should not be immune from
questioning for proper design, I'll point out nevertheless:
. the fileserver has a hell of a job already and loading it with yet
another function will hardly make it slimmer
. the fileserver is actually not concerned managing volumes, certainly
not all volumes which float around on the disks. Logically, it would
then also have to manage synchronization of volumes it never cares
about itself, such as the various temporary clones, R/W volids when
the only one in sight is a R/O, etc.
. sooner or later a volume id will stay on the OfflineVolumes list for
obscure reasons, you have no way to find out, requests for it will be
denied and you'll scratch your head for a while. Then you'll restart
the file server...
. denying a request requires code upstream to handle that, and retry
A "design" solution would probably use mutex-like objects, even
administrative tools to inspect and manage them. The volserver's
transactions have some nice aspects, in particular transparency. And
in the long run I suspect the volserver could salvage volumes itself.
However, as Derrick points out the original problem might still come
from elsewhere (I fail to see how volserver/salvager/fileserver
stepping on each other's toes could explain the peculiar vnode length
field corruption we saw) so the "design" solution will have to wait.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985 Fax: +41 22 767 7155