[OpenAFS] Stopping afsd on Solaris?

Sergio Gelato Sergio.Gelato@astro.su.se
Fri, 28 Oct 2005 17:00:17 +0200

* Derrick J Brashear [2005-10-28 08:51:46 -0400]:
> On Fri, 28 Oct 2005, Sergio Gelato wrote:
> >Precisely. Having recently tried to upgrade OpenAFS on a Solaris 8
> >test system via the modunload route, I can say that if AFS is in active
> >use there is a good chance of the modunload approach triggering a
> >kernel panic. So I prefer to upgrade with a clean reboot instead.
> umount should fail if something is accessing AFS. If umount fails, afsd 
> -shutdown should fail, as should unloading the module.

Well, then it probably depends on which version one is upgrading from.
I don't suppose 1.2.9 was bugfree? But point taken, if I ever reproduce
this with the current version I'll file a bug report.

... I think I just reproduced my problem with 1.4.0-rc4. Will try a newer RC
(or 1.4.0 final) one of these days. Here is what I did:

# /etc/init.d/afs unload
Killing inetd.afs
Unmounting /afs
Unloading afs module (98)
afsd: Shutting down all afs processes and afs state

# mount
/ on /dev/dsk/c0t0d0s0 read/write/setuid/intr/largefiles/onerror=panic/dev=2200000 on Sun Oct 9 01:14:59 2005
/proc on /proc read/write/setuid/dev=3e40000 on Sun Oct  9 01:14:57 2005
/dev/fd on fd read/write/setuid/dev=3f00000 on Sun Oct  9 01:15:01 2005
/etc/mnttab on mnttab read/write/setuid/dev=4000000 on Sun Oct  9 01:15:06 2005
/var/run on swap read/write/setuid/dev=1 on Sun Oct  9 01:15:06 2005
/tmp on swap read/write/setuid/dev=2 on Sun Oct  9 01:15:08 2005

(plus some other stuff but no AFS)

# modinfo | grep afs
(no output)
# pgrep -fl afsd
(system panics and reboots)

>From the logs: BAD TRAP: type=31 rp=2a100591430 addr=78128c48 mmu_fsr=0
unix:die+a4 (31, 2a100591430, 78128c48, 0, 2a100591430, e058a030)
unix:trap+8b8 (78128000, 1, 6, 0, 2a100591430, 0)
unix:sfmmu_tsb_miss+66c (104293b0, 0, 3000004ff88, 0, 3000004ff88, 0)
unix:prom_rtt+0 (30000c10e80, 2a100591588, 2a100591580, 3000038d328, 1000c36c, 0)
genunix:rm_assize+120 (42c8, 1, 3000117e2a8, 1fff, 104305a8, 300004f9c88)
procfs:prgetpsinfo32+37c (300004f9c88, 2a100591770, 30001077520, 30001077520, 2a100591770, 30001383b20)
procfs:pr_read_psinfo_32+3c (30001077520, 2a100591978, 30001db23a0, 3000038d328, 30001fae0a0, 30001fae000)
genunix:read+25c (0, 0, 1, 30001512e70, 4, 150)
genunix:read32+30 (4, ffbef2f8, 150, ff33c008, 21440, 1176c)

This was SunOS Release 5.8 Version Generic_117350-25 64-bit .

All right, the panic is actually triggered by pgrep; if I skip that step
and do an "/etc/init.d/afs start" instead, I am at least sometimes able
to get by without a panic. So my initial claim wasn't accurate, in that
the problem manifests itself when AFS is inactive and the module is
unloaded normally.

>From a practical point of view, this is still annoying since I have no
control over what background processes might look at psinfo during the 
time when no afs module is loaded.

I have no idea whether OpenAFS or Solaris is to blame for this one.