[OpenAFS-devel] Re: [OpenAFS] Re: 1.6 and post-1.6 OpenAFS branch management and schedule

Thu, 17 Jun 2010 23:18:27 +0200

Am Donnerstag, 17. Juni 2010 22:54:25 schrieb Andrew Deason:
> On Thu, 17 Jun 2010 22:01:08 +0200
>
> Christof Hanke <christof.hanke@rzg.mpg.de> wrote:
> > Am Donnerstag, 17. Juni 2010 21:30:23 schrieb Andrew Deason:
> > > And in particular, NTFS and other journalled filesystems have the
> > > advantage of a journal, and probably lots of other similarly helpful
> > > things. Guess what we do not have.
> >
> > Right, this is actually Hartmut's point. We have user-volumes spread
> > out over a few fileservers, so in each partition of such a server
> > there are hundreds of user home-volumes. It is very painful for the
> > admin and his phone, if these are not back up quickly. Being user-home
> > volumes they are active, so DAFS does _not_ help here at all. That's
> > why we take the risk of some minor corruptions rather than knowing
> > that some hundred people cannot work at all (or simulation-jobs crash
> > or what not).
>
> I am not the one you need to convince about this line of thinking, but
> I'm not sure I agree about DAFS not being a help here at all. When you
> say the volumes are "active", how active is "active"? DAFS gives you
> some configuration knobs which allow you to specify how often inactive
> volumes become detached. The default is 2 hours, but you could possibly
> set it much lower than that.
>
I know I don't have to convince you.
Well, these are Home-Volumes. I do hope that my users touch their home more 
often than once in 2 hours. 

> That has other performance implications and I'm not necessarily
> recommending doing that, but it's something to think about.
>
> > Thus, there is no real alternative to us there yet.  Unless DAFS
> > salvages some hundred volumes in parallel rather than one after the
> > other.  Does it do that ?  This might alleviate the problem.
>
> You can salvage multiple volumes on a partition at once, but not
> hundreds, and not with the speed you want yet. There is some
> mostly-complete code to allow a much higher number of salvages to occur
> at once, but I'm pretty sure that will not be merged for the 1.6
> release. I'm not sure about subsequent releases in the 1.6 series, but
> it will probably be in at least 1.10 the way things are going now. But
> since 1.10 is the earliest possible time I've heard that fast-restart
> will be removed, perhaps that's not so bad for you.
>

Let's see. we have plenty of out-of-tree patches anyway, but I seek to reduce 
their number.

> And as has been mentioned elsewhere in the thread, you need to wait for
> the VG hierarchy summary scan to complete, no matter how fast salvaging
> is or how many you do in parallel. That involves reading the headers of
> all volumes on the partition, so it's not fast (but it is very fast if
> you're comparing it to the recovery time of a 1.4 unclean shutdown)
That's ok, I guess.