[OpenAFS-devel] Solaris 10 6/06 (update 2) crash (not the gafs_rename() one)

William Setzer William_Setzer@ncsu.edu
Fri, 21 Jul 2006 10:25:33 -0400


Jeffrey Hutzelman <jhutz@cmu.edu> writes:
: <William_Setzer@ncsu.edu> wrote:
: 
: > I'm using the OpenAFS 1.4.1 distribution pre-compiled for Solaris 10
: > sparc.  Under Solaris 10 update 2, I get the following panic if the
: > machine is running as an NFS server and some (unknown) NFS request
: > comes in:
: 
: How long does it take this to happen?

It doesn't take long at all.  I'm mounting a remote root image (boot
net:dhcp -s) and it happens fairly early in the boot sequence.

:  Can you use tcpdump to capture the NFS traffic and figure out what
: request is triggering this?

Yep.  It's available at http://www4.ncsu.edu/~wsetzer/soldump.out if
you want to look at it.  As best I can tell, it seems to happen on the
first NFS lookup:

  09:59:47.968645 IP 52.1.4.163.2049 > 152.1.4.165.3706724788: reply ok 168
  09:59:47.969417 IP 152.1.4.165.3706724789 > 152.1.4.163.2049: 136 lookup fh 136,6/85559 "devices"
  09:59:51.684025 IP 152.1.4.165.3706724790 > 152.1.4.163.2049: 136 lookup fh 136,6/85559 "platform"
  09:59:52.783772 IP 152.1.4.165.3706724789 > 152.1.4.163.2049: 272 lookup fh 136,6/85559 "devices"
  10:00:02.413816 IP 152.1.4.165.3706724789 > 152.1.4.163.2049: 272 lookup fh 136,6/85559 "devices"
  10:00:21.663896 arp who-has 152.1.4.163 (ff:ff:ff:ff:ff:ff) tell 152.1.4.165
  10:00:22.663741 arp who-has 152.1.4.163 (ff:ff:ff:ff:ff:ff) tell 152.1.4.165

The "reply ok" packet is the last packet sent by the crashing machine,
and the "lookup /devices" is the first NFS lookup in the dump.

: Are you running the AFS/NFS translator, or is the NFS server unrelated to 
: AFS?

It appears to be unrelated.  I'm not running the AFS/NFS translator,
but I'm using the "libafs64.o" kernel module.  When I do this with the
"libafs64.nonfs.o" module, the machine does not crash (as you would
probably expect).


William