[OpenAFS-port-darwin] Kernel panic from bug 41550 reproduced
Jonas Maebe
jonas.maebe@elis.ugent.be
Sat, 13 Oct 2007 18:54:43 +0200
Hello,
I've discovered a use-case with which I can fairly reliably reproduce
the kernel panic described in <http://rt.central.org/rt/Ticket/
Display.html?id=41550>
What I don't understand is how to add new comments to that bug report
(is it only possible by sending a mail with a specially formatted
subject line to openafs-bugs@openafs.org or so?), but I guess that's
just me. Anyway:
I'm using a dual G5/1.8GHz with 10.4.10 and 3GB ram.
At first I had OpenAFS 1.4.2 still installed, and got this kernel panic:
*********
Sat Oct 13 17:26:35 2007
panic(cpu 1 caller 0x000E8E00): remove_fsref: no named reference
Latest stack backtrace for cpu 1:
Backtrace:
0x000952D8 0x000957F0 0x00026898 0x000E8E00 0x6F12D150
0x6F127C88 0x000
FB660 0x000E2424
0x000E1FB8 0x000EEC88 0x000EEEEC 0x000EEF8C 0x002AB548
0x000ABB30 0x000
00000
Kernel loadable modules in backtrace (with dependencies):
org.openafs.filesystems.afs(1.4.2)@0x6f089000
Proceeding back via exception chain:
Exception state (sv=0x5A239000)
PC=0x900062AC; MSR=0x0200F930; DAR=0xE1285000;
DSISR=0x42000000; LR=0x0004
4118; R1=0xBFFFCC50; XCP=0x00000030 (0xC00 - System call)
Kernel version:
Darwin Kernel Version 8.10.0: Wed May 23 16:50:59 PDT 2007;
root:xnu-792.21.3~1/
RELEASE_PPC
*********
After googling for "remove_fsref: no named reference" I found <http://
www.nabble.com/OpenAFS-1.4.2-crashing-on-Intel-Macs-(10.4)-
t3137912.html> and from there got the link to the bug report
mentioned above. The last comment to that bug report mentions a
commit of a possible fix. I checked CVS and it seems this commit
should be in 1.4.5-pre1, so I downloaded and installed that one. I
still get the kernel panic though:
*********
Sat Oct 13 17:44:17 2007
panic(cpu 0 caller 0x000E8E00): remove_fsref: no named reference
Latest stack backtrace for cpu 0:
Backtrace:
0x000952D8 0x000957F0 0x00026898 0x000E8E00 0x7175E2D4
0x71758D64 0x000FB660 0x000E2424
0x000E1FB8 0x000EEC88 0x000EEEEC 0x000EEF8C 0x002AB548
0x000ABB30 0x636E746C
Kernel loadable modules in backtrace (with dependencies):
org.openafs.filesystems.afs(1.4.5fc1)@0x716ba000
Proceeding back via exception chain:
Exception state (sv=0x719C4C80)
PC=0x900062AC; MSR=0x0000F930; DAR=0xE12B4000;
DSISR=0x42000000; LR=0x0004
4118; R1=0xBFFFCC70; XCP=0x00000030 (0xC00 - System call)
Kernel version:
Darwin Kernel Version 8.10.0: Wed May 23 16:50:59 PDT 2007;
root:xnu-792.21.3~1/
RELEASE_PP
*********
Here's the symbolised version of the 1.4.5fc1 backtrace:
(gdb) x/i 0x000E8E00
0xe8e00 <vnode_removefsref+48>: lhz r0,44(r31)
(gdb) x/i 0x7175E2D4
0x7175e2d4 <afs_darwin_finalizevnode+976>: bl 0x7175e530
<afs_darwin_finalizevnode+1580>
(gdb) x/i 0x71758D64
0x71758d64 <afs_vop_lookup+844>: mr r0,r3
(gdb) x/i 0x000FB660
0xfb660 <VNOP_LOOKUP+144>: mr r30,r3
(gdb) x/i 0x000E2424
0xe2424 <lookup+500>: mr. r28,r3
(gdb) x/i 0x000E1FB8
0xe1fb8 <namei+588>: mr. r30,r3
(gdb) x/i 0x000EEC88
0xeec88 <access+300>: mr. r29,r3
(gdb) x/i 0x000EEEEC
0xeeeec <access+912>: lwz r0,488(r1)
(gdb) x/i 0x000EEF8C
0xeef8c <stat+52>: lwz r0,88(r1)
(gdb) x/i 0x002AB548
0x2ab548 <unix_syscall+756>: lwz r0,20508(r29)
(gdb) x/i 0x000ABB30
0xabb30 <shandler+272>: li r3,7
(the last address, 0x636E746C, appears to be bogus)
Now, how I can reproduce the panic: by compiling the run time library
of the Free Pascal Compiler (fpc) with make -j 2, starting with the
latest unstable of the compiler (haven't tried starting with the
latest stable, but that one won't work very well with AFS anyway
because it had problems with case-sensitive file systems under Mac OS
X), with the sources located on a (remote) AFS volume.
One possibly interesting thing to note: fpc uses internal directory
caching, i.e., the first time it looks for a file in a directory it
immediately goes through all files and directories in that directory,
adds their names to an internal hashtable, and uses that table from
then on. It performs this directory caching using opendir/readdir/
closedir. So if two instances of the compiler are running
simultaneously (it's an smp machine), you can get various kinds of
interleaving of opendir/readdir/closedir on the same directory from
the different compiler processes.
If you want to try it on your own system (note: the following
sequence is *untested*, and requires that svn is installed), do the
following *on an AFS volume*:
mkdir fpc
cd fpc
svn co -r 8765 http://svn.freepascal.org/svn/fpc/trunk/rtl rtl
curl -O http://www.elis.ugent.be/~jmaebe/ppcppc3.tbz
tar xjf ppcppc3.tbz
cd rtl/darwin
make FPC=`pwd`/../../ppcppc3 clean
make FPC=`pwd`/../../ppcppc3 OPT="-ap -XP" all -j 2
(don't do a single "make clean all -j 2", as the Makefile doesn't
specify ordering for the clean and all targets)
When it panics, it does so for me fairly early on (the system unit is
compiled using a single process as everything else depends on it, but
from then on things start in parallel). It seems to happen more often
the second time you do this (repeat the make clean and make all lines
if it didn't panic), i.e., when some things have already been cached.
Note that that the supplied ppcppc3 is a PowerPC binary. I can also
provide an i386 binary if required (I still don't have an Intel Mac
myself, but I do have remote access to one).
I hope this helps tracking down the problem.
Jonas