[OpenAFS-devel] OpenAFS 1.4.0 rc3 crashes on Linux 2.6

Andy Lutomirski amluto@hotmail.com
Fri, 21 Oct 2005 01:09:37 +0000



>From: "chas williams - CONTRACTOR" <chas@cmf.nrl.navy.mil>
>To: "Andy Lutomirski" <amluto@hotmail.com>
>CC: openafs-devel@openafs.org
>Subject: Re: [OpenAFS-devel] OpenAFS 1.4.0 rc3 crashes on Linux 2.6 Date: 
>Wed, 12 Oct 2005 22:18:24 -0400
>
>In message <BAY106-F30200F494E413413103CB3C1870@phx.gbl>,"Andy Lutomirski" 
>writes:
> >I frequently get the following crash.  I can trigger it most of the time 
>by
> >running 'ls /afs/ir' (which is a symlink to /afs/ir.stanford.edu) after 
>not
> >using afs for some time.
>
>what afs options are you using?  -dynroot? -fakestat?  approximately how
>long is some time?
>
> >ls            D ffff810019d0b440     0  3410   3361          3413
> >(NOTLB)
> >ffff810011eb7e38 0000000000000082 00000000005206a8 ffff81000db23d68
> >       ffff810008856e50 ffff8100088560b0 ffff810008856e50 
>ffff8100088562c8
> >       0000000000000000 ffffffff881992b0
> >Call Trace:<ffffffff881992b0>{:libafs:afs_linux_getattr+288}
> ><ffffffff80389db6>{__down+198}
> >       <ffffffff8012d5a0>{default_wake_function+0}
> ><ffffffff8038b9e4>{__down_failed+53}
> >       <ffffffff8018a250>{filldir+0}
> ><ffffffff8018a5e9>{.text.lock.readdir+5}
> >       <ffffffff8018a3c2>{sys_getdents+130} 
><ffffffff8010f389>{error_exit+0}
> >       <ffffffff8010ea96>{system_call+126}
> >
> >Any ideas?  Any tests I can run to help debug this?
>
>you dont mention the machine platform, but i am going to guess x86_64?
>it would be helpful if you could do something like:
>
>gdb /wherever/the/afs/module/is.ko
>(gdb) info line *afs_linux_getattr+0x288
>
>and send along the output.  thanks.

Yes, this is x86_64.

I just re-triggered it with 1.4.0-rc6 (both userspace and kernel, although 
the rc3 userspace part had been running with the rc6 libafs.ko for awhile 
without an intervening reboot).

Here's the OOPS:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at "/var/tmp/portage/openafs-kernel-1.4.0_rc6/work/op:131
invalid operand: 0000 [1] PREEMPT
CPU 0
Modules linked in: libafs ipt_conntrack iptable_nat ipt_REJECT ipt_state 
ip_conntrack ipt_multiport iptable_filter ip_tables snd_pcm_oss 
snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq 
snd_cs4281 snd_opl3_lib snd_hwdep snd_via82xx snd_ac97_codec snd_pcm 
snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd 
soundcore xfs raid5 xor raid0
Pid: 1007, comm: ls Tainted: P      2.6.13-gentoo-r3
RIP: 0010:[<ffffffff8818a250>] <ffffffff8818a250>{:libafs:osi_Panic+0}
RSP: 0000:ffff810008575da0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: fffffffffffffffb RCX: 0000000000000000
RDX: ffff81001871af10 RSI: 00000000000a49c6 RDI: ffffffff881aacda
RBP: ffff81001f05ac00 R08: 0000000000000000 R09: ffff810008402d80
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100148c7000
R13: ffffffff881c30c0 R14: 00000000000a49c6 R15: 00000000000a49c6
FS:  00002aaaaaadbb00(0000) GS:ffffffff80502800(0000) knlGS:0000000040105940
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000051db78 CR3: 00000000047d7000 CR4: 00000000000006e0
Process ls (pid: 1007, threadinfo ffff810008574000, task ffff810010ca20b0)
Stack: ffffffff881953a8 0000000000000000 0000000000000000 ffff81001ba833f0
       0000000000000000 ffffc20000797d08 000000000051cb40 000000000051cb20
       ffffffff88145690 0000000000000001
Call Trace:<ffffffff881953a8>{:libafs:osi_UFSOpen+440} 
<ffffffff88145690>{:libafs:DRead+848}
       <ffffffff881535bd>{:libafs:afs_dir_GetBlob+13} 
<ffffffff8817043f>{:libafs:BlobScan+31}
       <ffffffff881988c1>{:libafs:afs_linux_readdir+1041} 
<ffffffff8011f869>{do_page_fault+1113}
       <ffffffff8018a250>{filldir+0} <ffffffff8018a0e6>{vfs_readdir+118}
       <ffffffff8018a3c2>{sys_getdents+130} <ffffffff8010f389>{error_exit+0}
       <ffffffff8010ea96>{system_call+126}

Code: 0f 0b a3 68 d6 1a 88 ff ff ff ff c2 83 00 c3 90 48 83 fe 01
RIP <ffffffff8818a250>{:libafs:osi_Panic+0} RSP <ffff810008575da0>

It looks like I don't have symbols for libafs.ko.  Grr.  I've rebuilt with 
symbols.

Take these gdb outputs with a grain of salt, since they're from the wrong 
binary.  Hopefully code generation is deterministic enough...

Line 230 of 
"/var/tmp/portage/openafs-kernel-1.4.0_rc6/work/openafs-1.4.0-rc6/src/libafs/MODLOAD-2.6.13-gentoo-r3-SP/osi_vnodeops.c" 
starts at address 0x558c1 <afs_linux_readdir+1041> and ends at 0x558c3 
<afs_linux_readdir+1043>.
225          */
226         code = 0;
227         offset = (int) fp->f_pos;
228         while (1) {
229             dirpos = BlobScan(tdc, offset);
230             if (!dirpos)
231                 break;
232
233             de = afs_dir_GetBlob(tdc, dirpos);
234             if (!de)


Line 82 of 
"/var/tmp/portage/openafs-kernel-1.4.0_rc6/work/openafs-1.4.0-rc6/src/libafs/MODLOAD-2.6.13-gentoo-r3-SP/afs_vnop_readdir.c" 
starts at address 0x2d43f <BlobScan+31> and ends at 0x2d442 <BlobScan+34>.
77          AFS_STATCNT(BlobScan);
78          /* advance ablob over free and header blobs */
79          while (1) {
80              pageBlob = ablob & ~(EPP - 1);  /* base blob in same page */
81              tpe = (struct PageHeader *)afs_dir_GetBlob(afile, pageBlob);
82              if (!tpe)
83                  return 0;           /* we've past the end */
84              relativeBlob = ablob - pageBlob;        /* relative to 
page's first blob */
85              /* first watch for headers */
86              if (pageBlob == 0) {    /* first dir page has extra-big 
header */


******* This one is even less likely correct since it's an rc3 oops with rc6 
symbols.  Sorry.

gdb) info line *afs_linux_getattr+288
Line 672 of 
"/var/tmp/portage/openafs-kernel-1.4.0_rc6/work/openafs-1.4.0-rc6/src/libafs/MODLOAD-2.6.13-gentoo-r3-SP/osi_vnodeops.c" 
starts at address 0x565d0 <afs_linux_getattr+288> and ends at 0x56610 
<afs_linux_dentry_revalidate>.
(gdb) list *afs_linux_getattr+288
0x565d0 is in afs_linux_getattr 
(/var/tmp/portage/openafs-kernel-1.4.0_rc6/work/openafs-1.4.0-rc6/src/libafs/MODLOAD-2.6.13-gentoo-r3-SP/osi_vnodeops.c:672).
667             int err = afs_linux_revalidate(dentry);
668             if (!err) {
669                     generic_fillattr(dentry->d_inode, stat);
670     }
671             return err;
672     }
673     #endif
674
675     /* Validate a dentry. Return 1 if unchanged, 0 if VFS layer should 
re-evaluate.
676      * In kernels 2.2.10 and above, we are passed an additional flags 
var which

I'll reboot and run with the new libafs with real symbols and email again if 
this triggers.

Thanks,
Andy

_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/