[Port-solaris] Re: Kernel panic after a few minutes
Sebastian Hanigk
shanigk@fs.tum.de
Mon, 24 Oct 2011 21:47:40 +0200
Am 24.10.2011 um 19:49 schrieb Andrew Deason:
Hello Andrew,
> I'm inferring from your other comments that this is easily =
reproducible?
> That is, it happens every time you bring up the client after a few
> minutes.
well, first and foremost, I seem to have solved the problem. In a =
nutshell, it has been the time setting feature of the afsd that has =
caused trouble. Now deactivated, the machine runs flawless (at least =
until now ...); NTPv4 is running for time synchronisation.
Another feature of this system has been the UFS cache partition which =
has been mounted with the logging option (copy&paste from the =
uni-hohenheim scripts ...); a short search has shown that this mount =
option has been frowned upon at least in the past. Firstly, I had =
remounted the cache partition with the nologging option, but this alone =
was not successful in terms of crash prevention.
> Could you provide 'uname -a' so we know exactly what kind of kernel?
SunOS sdrsim02 5.10 Generic_147441-01 i86pc i386 i86pc
> Also, that stuff claims to be for 1.4, though I cannot seem to get to
> any of the paths mentioned for the SMF, so I don't know if it makes a
> difference.
Actually mostly it is a conversion of the afs.rc script in terms of =
Solaris' SMF framework. The method file makes verbatim use of the rc =
script in most places.
> Do you have a vmcore and vmunix (or a 'vmdump') in /var/crash ? (Or
> probably several by now) And would you be willing to provide them? =
These
> may contain sensitive information, as they contain all of the memory =
of
> the running system at that time, but if I had them to look at myself, =
it
> would make it easier to see what's going on more quickly.
I have to check back with my boss, but it should be no problem providing =
the crash images in a private manner.
> Otherwise, we can talk you through some things to look at. One quick
> thing to check are the versions in play; run
>=20
> strings /usr/vice/etc/afsd | grep built
@(#) OpenAFS 1.6.0 built 2011-10-23
Built from source with the Solaris Studio compilers (12.2).
The original binaries from the openafs.org site showed the same problem, =
strings on that binary gives
@(#) OpenAFS 1.6.0 built 2011-08-16
> rxdebug <client machine> 7001 -version
AFS version: OpenAFS 1.6.0 built 2011-10-23
> to verify the version of the running client kernel module. To see the
> version number of the kernel module that was in use when the machine
> panic'd, load the vmcore in mdb (run 'mdb -k unix.0 vmcore.0', or
> whatever the unix and vmcore names are) and run
> 'cml_version_number/x64c' like so:
bash-3.2# mdb -k unix.6 vmcore.6=20
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp =
cpu.generic zfs sockfs ip hook neti sctp arp usba uhci fctl nca lofs cpc =
random crypto fcip ptm ufs sppp nfs ]
> cml_version_number/x64c
cml_version_number:
cml_version_number: 2840 #) OpenAFS 1.6.0 built =
2011-10-23 ?
=
??
???
> I'd also be interested in 'afs_setTime/U' and 'afs_setTimeHost/K'.
> afs_setTime/U
afs_setTime:
afs_setTime: 1 =20
> afs_setTimeHost/K
afs_setTimeHost:
afs_setTimeHost:ffffffffb1d91af0
If I can provide more information, please let me know!
Best regards,
Sebastian=