[OpenAFS] Summary of recommended configuration options from the
workshop
Kim Kimball
dhk@ccre.com
Tue, 27 May 2008 10:34:00 -0600
OTOH, I've used fastrestart with great success. In fact, we started
using fastrestart when we migrated to 1.4.0 due to some unrememberable
defect that caused the fileserver to crash [hard] periodically.
After rebuilding with fastrestart the fileserver continued to crash but
the users stopped caring -- since there wasn't any impact, or at least
no impact that any one reported.
I use the 'notifier' script option to automatically run a script to look
for volumes that need to be salvaged, and salvage them individually. If
there are more than X% I the script invokes "bos salvage" instead.
The only caveat here is that the notifier script is executed multiple
times -- as it 'fires' for both the fileserver and volserver processes.
I've been meaning to report this as a defect -- it's almost certainly
not intentionally coded this way.
Without fastrestart all of my volumes are unavailable until salvage
completes.
With fast restart I 1) very rarely have any offline volumes to deal with
and 2) return 99+ % [almost always 100%] of my users to full
functionality in less than a minute. (The rest have to suffer.)
In more than two years of doing this no one has noticed.
I don't think there's a black/white answer here -- for us fastrestart is
a huge win.
YMMV!
Kim
Derrick Brashear wrote:
> On Mon, May 26, 2008 at 6:38 PM, Robert Banz <rob@nofocus.org> wrote:
>
>>> From the conference:
>>> Why Derrick doesn't use fastrestart
>>> 1) You have to have something parse logfiles and salvage it
>>> 2) If you're running an inode fileserver, every time you salvage you crawl
>>> all of the inodes. You salvage 10 volumes, you're going through 10*<number
>>> of inodes>
>>>
>> I agree with Derek's analysis --
>>
>
>
> Dude, when did warlord comment?
>
>
>> that yes, in the event you'd really have to
>> salvage, you could salvage a lot.
>>
>> However, in my experience, salvaging has only been necessary in the face of
>> a hard system crash -- basically, problems with data written to the
>> filesystem out-of-order from what AFS thinks it should be, etc. If you're
>> unlucky to be running in an environment where your storage is unstable, or
>> your filesystem doesn't guarantee (or close to it) ordered writes, you've
>> got other problems. Though, I'd say it's very reasonable to salvage after a
>> hard crash -- perhaps that's a job for an init script, or the administrator
>> that was investigating the cause of the failure.
>>
>
> I want self-healing. Also, fileservers do themselves occasionally crash.
>
>
>> In most situations where I was running into that required a fileserver
>> restart with prejudice (kill -9) -- things like thread lockups -- I've never
>> had to salvage, and fastrestart is a lifesaver when you have fileservers
>> with a good deal of data. Customers don't enjoy 30+ minutes of outage.
>>
>
> Me either. Even ignoring demand attach, namei plus well-tuned salvager
> is fast, though, so luckily i don't actually care.
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>