[OpenAFS] Re: kernel panics with 1.6.0 and 1.6.1pre2 on openindiana

Andrew Deason adeason@sinenomine.net
Fri, 27 Jan 2012 14:11:44 -0600

On Fri, 27 Jan 2012 11:31:47 -0800
"Logan O'Sullivan Bruns" <logan@gedanken.org> wrote:

> I haven't seen them since switching to 1.6.1pre2 but I've also
> carefully avoided the use case which was causing me problems. The
> salvager crash is definitely fixed in 1.6.1pre2. The one I'm not sure
> about is recovery from a crash during an initial clone. What was
> reproducible for me with 1.6.0 was that if I did addsite to a dafs
> managed partition, started a release but then crashed the system (due
> to the aforementioned panic), then when I tried to release it again it
> would say it was deleting an extant clone then it would fail to
> restore the dump and the release to that site would fail. The
> corresponding server side log messages would be:
> Volser: ReadVnodes: IH_CREATE: File exists - restore aborted

Okay, yeah, this isn't platform-specific or anything; sorry, I thought
you meant that the fileserver or volserver just crashed on its own or
something and caused problems.

> It seems almost as if it wasn't cleaning up the never fully released
> volume properly. Once it got into that state it would never succeed
> for that volume. Even if I did a vos remove.

What about a manual salvage? I think this is a known issue, but it was
'fixed' by causing a salvage to occur automatically. I believe the fixes
are on 1.6.x, and so should be in 1.6.1pre2 (I'm thinking of gerrit 6080
and 6286 here).

> > The head of the 1.4.x branch should have support for solaris 11, so if
> > you just want something else to try, it's not inconceivable that that
> > could be better. I don't expect it to be any different, though.
> I wasn't aware that the 1.4.x branch had the changes to avoid the afs
> / unlinkat syscall conflict? I'll give that a try.

No, 1.4.x does not have that. However, Oracle Solaris 11 renumbered
their syscalls so 65 is not unlinkat anymore, because running OpenAFS
binaries compiled for Solaris 10 could cause files to become unlinked
accidentally. Did illumos never do that as well? We informed them about
it (or rather, Frank from Oracle did).

If syscall 65 is open, 1.4.x might still work. Otherwise it will not.

Andrew Deason