[OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

Fri, 18 Jun 2010 08:31:24 -0400

In message <4C1B40AD.1040108@pclella.cern.ch>,Rainer Toebbicke writes:
>I beg to disagree: the Volume/Vnode back-end has by no means the same problems 
>that a file system might have. Damages there will never wildly destroy random 
>items on disk, as you would have to be afraid using in a file system. At least 
>in namei, damages in a volume are entirely contained therein, files themselves 
>are at the worst entirely replaced by others, they're never corrupted partly 
>other than being half-written or such. Of course files on disk can become 
>unfindable or directories can have bogus entries.

i dont think fast-restart and bitmap-later are the work of the devil
either.  if the server goes down, the users want it back NOW especially
if it is during working hours.  they really dont care if one or two of
their files are broken somewhere.  i can salvage the partitions later
in the evening or on the weekend.

>I reckon that in over 15 years of AFS service we've probably had more bit 
>errors in files due to uncaught memory errors and uncaught transmission 
>errors, not speaking about the major culprit "programming errors", than nasty 
>inconsistencies after crashes which complete and immediate salvaging would 
>have caught.

this is actually a more insidious problem.  with the advent of the ups
the file servers tend to be up longer and longer (years in some cases
i imagine).  this means that unless you do something, you never salvage
the partition.  should i salvage when i upgrade the servers?  possibly.
i dont recall that being a best practice.  it should/might be though.
an upgrade does a clean shutdown so it wont salvage by default.