[OpenAFS] OpenAFS 1.3.87 and 1.4.0-rc6 stability issues on Solaris 10

Logan O'Sullivan Bruns logan@gedanken.org
Tue, 11 Oct 2005 09:39:18 -0700


For whatever it's worth I was able to reproduce the behavior you
described with 1.4.0-rc4 on Solaris 10 (sparc).

Another problem I've seen consistently on Solaris 10 is that if the
NFS server is put under load while the AFS kernel module is loaded the
system panics. I ran into this initially while doing a net install of
Solaris 10 on a netra without a CD drive. I tried it several times
with AFS loaded and it paniced every time then when I disabled the
script to load the AFS module it worked fine. Anyway, another thing to
check if an AFS developer wants to spend some time testing on Solaris
10.

  - logan

On Tue, Oct 11, 2005 at 06:11:08PM +0200, Loic Tortay wrote:
> Hello,
> I'm facing a somewhat severe stability problem with OpenAFS 1.3.87 and
> 1.4.0-rc6 on Solaris 10, on both i386 and Sparc.
> 
> Using one of the new "SMF" command can easily trigger a panic on
> Solaris 10 when OpenAFS is running.
> 
> Specifically, the problem happens when running the "svcs -p" command
> when (and only when) OpenAFS is up and running.
> 
> About one time out of three, the system will panic immediatly.
> 
> The panic message always looks the same as does the stack trace, and it
> happens on both i386 (actually AMD64 booted in 32 bit mode) and Sparc
> (in 64 bit mode).
> 
> Besides that, access to AFS works correctly.
> 
> The stack trace on a V40z running running Solaris 10 in i386/32 bit
> mode is:
>  # mdb unix.10 vmcore.10
>  Loading modules: [ unix krtld genunix specfs ufs ip sctp usba fctl lofs nfs random ptm ]
>  > $c
>  contract_process_status+0x126(d1411c40, fed43330, 2, d0f10348, d0aa4e7c, 100000)
>  ctfs_stat_ioctl+0x9d()
>  fop_ioctl+0x1e(d1c21300, 63747300, 809dd88, 102001, d12ced90, d0aa4f80)
>  ioctl+0x199()
>  sys_sysenter+0xdc()
>  >
> 
> The problem also occurs with the latest Solaris 10 recommended patch
> cluster, the kernel release for the above mentionned machine is
> "Generic_118844-08" (it also happens with older kernel releases, and
> with kernel releases up to and including "Generic_118822-18" on Sparc).
> 
> OpenAFS is compiled with Sun Studio 10 (with the following
> patches on i386: 117831-03, 117837-05, 117846-07 and 118682-01).
> 
> The "configure" options used are "--enable-transarc-paths" and
> "--with-afs-sysname=sunx86_510" ("--with-afs-sysname=sun4x_510" on
> Sparc).
> 
> I have several "crash dumps" on both i386 and Sparc if needed.
> 
> The problem does not occur without AFS, I've run a simple
> "while :; do svcs -p > /dev/null;done" for about 24 hours (> 1.6
> million calls to "svcs -p") without a panic.
> 
> The same loop needs less than one second to trigger a panic with
> OpenAFS running.
> 
> 
> I can't find anything related to this on either the list archives or
> Google.
> 
> I've found a few things about people actually running OpenAFS on
> Solaris 10 including people running AFS cells on Solaris 10 servers,
> but none mentionning such issue.
> 
> So my question is: am I the only one with this issue ?
> 
> If so, has someone a clue on where to look for the origin of this
> problem ?
> 
> 
> Lo?c.
> -- 
> | Lo?c Tortay <tortay@cc.in2p3.fr> -     IN2P3 Computing Centre     |
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info