[OpenAFS-devel] Re: [OpenAFS] Re: 1.6 and post-1.6 OpenAFS branch management and schedule

Christof Hanke christof.hanke@rzg.mpg.de
Thu, 17 Jun 2010 22:01:08 +0200


Am Donnerstag, 17. Juni 2010 21:30:23 schrieb Andrew Deason:
> On Thu, 17 Jun 2010 11:59:29 -0700
>
> Russ Allbery <rra@stanford.edu> wrote:
> > "Christopher D. Clausen" <cclausen@acm.org> writes:
> > > I mean I occationally see NTFS errors in the event log on Windows
> > > servers. Windows doesn't take the disk offline and run a chkdsk for
> > > me to prevent potential errors, it allows me to try and access other
> > > data and if it works there are no problems and denies access to
> > > specific files or directories if there is corruption.
> >
> > I'm quite sure that, after an unclean crash, your Windows server
> > doesn't remount the file system without doing a consistency check.  No
> > operating system treats its file systems that way.
>
> And in particular, NTFS and other journalled filesystems have the
> advantage of a journal, and probably lots of other similarly helpful
> things. Guess what we do not have.

Right, this is actually Hartmut's point.
We have user-volumes spread out over a few fileservers, so in each partition 
of such a server there are hundreds of user home-volumes.
It is very painful for the admin and his phone, if these are not back up 
quickly. 
Being user-home volumes they are active, so DAFS does _not_ help here at all.
That's why we take the risk of some minor corruptions rather than knowing that 
some hundred people cannot work at all (or simulation-jobs crash or what 
not).
Thus, there is no real alternative to us there yet.
Unless DAFS salvages some hundred volumes in parallel rather than one after 
the other. 
Does it do that ?
This might alleviate the problem.

Christof