[OpenAFS] Re: kernel panics with 1.6.0 and 1.6.1pre2 on openindiana

Logan O'Sullivan Bruns logan@gedanken.org
Fri, 27 Jan 2012 11:31:47 -0800


Thanks for the quick reply. Some comments inline.

On Fri, Jan 27, 2012 at 12:59:39PM -0600, Andrew Deason wrote:
> On Fri, 27 Jan 2012 10:34:45 -0800
> "Logan O'Sullivan Bruns" <logan@gedanken.org> wrote:
> 
> > I'm setting up a couple of machines with OpenIndiana 151a. I'm adding
> > them to an existing cell. The file server side is working pretty well.
> > I had some problems with the salvager crashing which were fixed by
> > updating from 1.6.0 to 1.6.1pre2 and occasional problems with deleting
> > of extant clones not working properly which could be worked around.
> 
> I would like to hear more about this is any problems still exist. I have
> trouble seeing what could be different between solaris and oi that would
> cause server stuff to not work.

I haven't seen them since switching to 1.6.1pre2 but I've also carefully
avoided the use case which was causing me problems. The salvager crash is
definitely fixed in 1.6.1pre2. The one I'm not sure about is recovery 
from a crash during an initial clone. What was reproducible for me with
1.6.0 was that if I did addsite to a dafs managed partition, started a 
release but then crashed the system (due to the aforementioned panic),
then when I tried to release it again it would say it was deleting an
extant clone then it would fail to restore the dump and the release to
that site would fail. The corresponding server side log messages would be:

Volser: ReadVnodes: IH_CREATE: File exists - restore aborted

It seems almost as if it wasn't cleaning up the never fully released volume
properly. Once it got into that state it would never succeed for that volume.
Even if I did a vos remove. I could only add it to another /vicepxx.
Note that if the first release succeeded but a subsequent release got 
interrupted I didn't see the problem. I don't think I would have
seen the problem at all if it was for the kernel panics.

> > However I haven't been been able to get the client side to run under
> > load without a kernel panic.
> [...]
> > The general configuration is the same as I've used on a few solaris 10
> > sparc systems with the 1.4.X series for a number of years. I'm not
> > sure it is optimal but it has worked well for me. Basically a 2G ufs
> > zvol for the cache with parameters like this:
> 
> A UFS zvol is the best way, yes.
> 
> > Any tips for workarounds or whether it is worth trying the latest
> > source from git would be appreciated.
> 
> Probably not. I'm not aware of anyone trying to run openafs on
> openindiana; I've been meaning to look into it and I have a vm
> specifically for that ready to go... I just never got around to it. This
> email has given me a reason to look at it again, though, so I may have a
> better answer for you soon.

Yeah, I understand that OpenIndiana is off the beaten path but I was hoping
it was close enough to OpenSolaris to work the same. If you do get a chance
to look that'd be great. Thanks.

> The head of the 1.4.x branch should have support for solaris 11, so if
> you just want something else to try, it's not inconceivable that that
> could be better. I don't expect it to be any different, though.

I wasn't aware that the 1.4.x branch had the changes to avoid the afs /
unlinkat syscall conflict? I'll give that a try.

Thanks,
  logan


> 
> -- 
> Andrew Deason
> adeason@sinenomine.net
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info