[Port-solaris] Re: Kernel panic after a few minutes

Andrew Deason adeason@sinenomine.net
Mon, 24 Oct 2011 12:49:51 -0500


On Mon, 24 Oct 2011 08:00:23 +0200
Sebastian Hanigk <shanigk@fs.tum.de> wrote:

> I have installed OpenAFS 1.6.0 on a test client running Solaris 10
> with the newest patches, but after a few minutes of file system usage,
> the kernel panics.

I'm inferring from your other comments that this is easily reproducible?
That is, it happens every time you bring up the client after a few
minutes.

Could you provide 'uname -a' so we know exactly what kind of kernel?

> For startup, I'm using the OpenAFS SMF scripts from Mathias Feiler
> (Uni Hohenheim,
> <https://www.uni-hohenheim.de/~feiler/wiki/aktiv/doku.php?id=sys:var:sol:zoned_afs_server:afs-client_installation>),

It sure would be neat if people contributed some of this stuff back to
OpenAFS...

Also, that stuff claims to be for 1.4, though I cannot seem to get to
any of the paths mentioned for the SMF, so I don't know if it makes a
difference.

> the afsd startup command is "/usr/vice/etc/afsd -stat 2000 -dcache 800
> -daemons 3 -volumes 70 -afsdb -backup". On a side note: as I
> understand it, this should start 3 daemons, ps output lists 9
> processes.

The -daemons option configures how many "background daemon" processes to
use. There are several other daemon processes that get run no matter
what.

> As I'm not quite used to kernel debugging, perhaps some of you can
> shed some light on the matter. Its sister machine running as AFS
> server without client functionality runs perfectly.

Do you have a vmcore and vmunix (or a 'vmdump') in /var/crash ? (Or
probably several by now) And would you be willing to provide them? These
may contain sensitive information, as they contain all of the memory of
the running system at that time, but if I had them to look at myself, it
would make it easier to see what's going on more quickly.

Otherwise, we can talk you through some things to look at. One quick
thing to check are the versions in play; run

strings /usr/vice/etc/afsd | grep built

to verify the version of afsd, and run

rxdebug <client machine> 7001 -version

to verify the version of the running client kernel module. To see the
version number of the kernel module that was in use when the machine
panic'd, load the vmcore in mdb (run 'mdb -k unix.0 vmcore.0', or
whatever the unix and vmcore names are) and run
'cml_version_number/x64c' like so:

# mdb -k unix.0 vmcore.0 
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp zfs mpt ip hook neti sctp arp usba fctl lofs audiosup cpc fcip random crypto ptm ufs nfs ]
> cml_version_number/64c
cml_version_number:
cml_version_number:             @(#) OpenAFS 1.6.0[...]


I'd also be interested in 'afs_setTime/U' and 'afs_setTimeHost/K'.

-- 
Andrew Deason
adeason@sinenomine.net