[Port-solaris] Re: Kernel panic after a few minutes

Sebastian Hanigk shanigk@fs.tum.de
Mon, 24 Oct 2011 21:47:40 +0200


Am 24.10.2011 um 19:49 schrieb Andrew Deason:

Hello Andrew,

> I'm inferring from your other comments that this is easily =
reproducible?
> That is, it happens every time you bring up the client after a few
> minutes.

well, first and foremost, I seem to have solved the problem. In a =
nutshell, it has been the time setting feature of the afsd that has =
caused trouble. Now deactivated, the machine runs flawless (at least =
until now ...); NTPv4 is running for time synchronisation.

Another feature of this system has been the UFS cache partition which =
has been mounted with the logging option (copy&paste from the =
uni-hohenheim scripts ...); a short search has shown that this mount =
option has been frowned upon at least in the past. Firstly, I had =
remounted the cache partition with the nologging option, but this alone =
was not successful in terms of crash prevention.

> Could you provide 'uname -a' so we know exactly what kind of kernel?

SunOS sdrsim02 5.10 Generic_147441-01 i86pc i386 i86pc

> Also, that stuff claims to be for 1.4, though I cannot seem to get to
> any of the paths mentioned for the SMF, so I don't know if it makes a
> difference.

Actually mostly it is a conversion of the afs.rc script in terms of =
Solaris' SMF framework. The method file makes verbatim use of the rc =
script in most places.

> Do you have a vmcore and vmunix (or a 'vmdump') in /var/crash ? (Or
> probably several by now) And would you be willing to provide them? =
These
> may contain sensitive information, as they contain all of the memory =
of
> the running system at that time, but if I had them to look at myself, =
it
> would make it easier to see what's going on more quickly.

I have to check back with my boss, but it should be no problem providing =
the crash images in a private manner.

> Otherwise, we can talk you through some things to look at. One quick
> thing to check are the versions in play; run
>=20
> strings /usr/vice/etc/afsd | grep built

@(#) OpenAFS 1.6.0 built  2011-10-23

Built from source with the Solaris Studio compilers (12.2).

The original binaries from the openafs.org site showed the same problem, =
strings on that binary gives

@(#) OpenAFS 1.6.0 built  2011-08-16

> rxdebug <client machine> 7001 -version

AFS version:  OpenAFS 1.6.0 built  2011-10-23

> to verify the version of the running client kernel module. To see the
> version number of the kernel module that was in use when the machine
> panic'd, load the vmcore in mdb (run 'mdb -k unix.0 vmcore.0', or
> whatever the unix and vmcore names are) and run
> 'cml_version_number/x64c' like so:

bash-3.2# mdb -k unix.6 vmcore.6=20
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp =
cpu.generic zfs sockfs ip hook neti sctp arp usba uhci fctl nca lofs cpc =
random crypto fcip ptm ufs sppp nfs ]
> cml_version_number/x64c
cml_version_number:
cml_version_number:             2840    #) OpenAFS 1.6.0 built  =
2011-10-23 ?
                                                                         =
   ??
                ???


> I'd also be interested in 'afs_setTime/U' and 'afs_setTimeHost/K'.

> afs_setTime/U
afs_setTime:
afs_setTime:    1              =20

> afs_setTimeHost/K
afs_setTimeHost:
afs_setTimeHost:ffffffffb1d91af0


If I can provide more information, please let me know!


Best regards,

Sebastian=