[OpenAFS] Summary of recommended configuration options from the workshop

Kim Kimball dhk@ccre.com
Tue, 27 May 2008 10:34:00 -0600

OTOH, I've used fastrestart with great success.  In fact, we started 
using fastrestart when we migrated to 1.4.0 due to some unrememberable 
defect that caused the fileserver to crash [hard] periodically. 

After rebuilding with fastrestart the fileserver continued to crash but 
the users stopped caring -- since there wasn't any impact, or at least 
no impact that any one reported.

I use the 'notifier' script option to automatically run a script to look 
for volumes that need to be salvaged, and salvage them individually.  If 
there are more than X% I the script invokes "bos salvage" instead.

The only caveat here is that the notifier script is executed multiple 
times -- as it 'fires' for both the fileserver and volserver processes.  
I've been meaning to report this as a defect -- it's almost certainly 
not intentionally coded this way.

Without fastrestart all of my volumes are unavailable until salvage 

With fast restart I 1) very rarely have any offline volumes to deal with 
and 2) return 99+ % [almost always 100%] of my users to full 
functionality in less than a minute.  (The rest have to suffer.)

In more than two years of doing this no one has noticed. 

I don't think there's a black/white answer here -- for us fastrestart is 
a huge win.



Derrick Brashear wrote:
> On Mon, May 26, 2008 at 6:38 PM, Robert Banz <rob@nofocus.org> wrote:
>>> From the conference:
>>> Why Derrick doesn't use fastrestart
>>> 1) You have to have something parse logfiles and salvage it
>>> 2) If you're running an inode fileserver, every time you salvage you crawl
>>> all of the inodes. You salvage 10 volumes, you're going through 10*<number
>>> of inodes>
>> I agree with Derek's analysis --

> Dude, when did warlord comment?
>> that yes, in the event you'd really have to
>> salvage, you could salvage a lot.
>> However, in my experience, salvaging has only been necessary in the face of
>> a hard system crash -- basically, problems with data written to the
>> filesystem out-of-order from what AFS thinks it should be, etc. If you're
>> unlucky to be running in an environment where your storage is unstable, or
>> your filesystem doesn't guarantee (or close to it) ordered writes, you've
>> got other problems. Though, I'd say it's very reasonable to salvage after a
>> hard crash -- perhaps that's a job for an init script, or the administrator
>> that was investigating the cause of the failure.
> I want self-healing. Also, fileservers do themselves occasionally crash.
>> In most situations where I was running into that required a fileserver
>> restart with prejudice (kill -9) -- things like thread lockups -- I've never
>> had to salvage, and fastrestart is a lifesaver when you have fileservers
>> with a good deal of data.  Customers don't enjoy 30+ minutes of outage.
> Me either. Even ignoring demand attach, namei plus well-tuned salvager
> is fast, though, so luckily i don't actually care.
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info