[OpenAFS] Re: kernel panics with 1.6.0 and 1.6.1pre2 on
Logan O'Sullivan Bruns
Fri, 27 Jan 2012 12:26:30 -0800
Thanks. Responses inline.
On Fri, Jan 27, 2012 at 02:11:44PM -0600, Andrew Deason wrote:
> On Fri, 27 Jan 2012 11:31:47 -0800
> "Logan O'Sullivan Bruns" <email@example.com> wrote:
> > I haven't seen them since switching to 1.6.1pre2 but I've also
> > carefully avoided the use case which was causing me problems. The
> > salvager crash is definitely fixed in 1.6.1pre2. The one I'm not sure
> > about is recovery from a crash during an initial clone. What was
> > reproducible for me with 1.6.0 was that if I did addsite to a dafs
> > managed partition, started a release but then crashed the system (due
> > to the aforementioned panic), then when I tried to release it again it
> > would say it was deleting an extant clone then it would fail to
> > restore the dump and the release to that site would fail. The
> > corresponding server side log messages would be:
> > Volser: ReadVnodes: IH_CREATE: File exists - restore aborted
> Okay, yeah, this isn't platform-specific or anything; sorry, I thought
> you meant that the fileserver or volserver just crashed on its own or
> something and caused problems.
> > It seems almost as if it wasn't cleaning up the never fully released
> > volume properly. Once it got into that state it would never succeed
> > for that volume. Even if I did a vos remove.
> What about a manual salvage? I think this is a known issue, but it was
> 'fixed' by causing a salvage to occur automatically. I believe the fixes
> are on 1.6.x, and so should be in 1.6.1pre2 (I'm thinking of gerrit 6080
> and 6286 here).
I did but I was having trouble the salvager with 1.6.0. I imagine it is
probably fixed in 1.6.1pre2 as you said. Thanks for the details.
> > > The head of the 1.4.x branch should have support for solaris 11, so if
> > > you just want something else to try, it's not inconceivable that that
> > > could be better. I don't expect it to be any different, though.
> > I wasn't aware that the 1.4.x branch had the changes to avoid the afs
> > / unlinkat syscall conflict? I'll give that a try.
> No, 1.4.x does not have that. However, Oracle Solaris 11 renumbered
> their syscalls so 65 is not unlinkat anymore, because running OpenAFS
> binaries compiled for Solaris 10 could cause files to become unlinked
> accidentally. Did illumos never do that as well? We informed them about
> it (or rather, Frank from Oracle did).
> If syscall 65 is open, 1.4.x might still work. Otherwise it will not.
Unfortunately not in OpenIndiana 151a at least.
> Andrew Deason
> OpenAFS-info mailing list