[OpenAFS-devel] oops on AFS mount

John A. Goebel jgoebel@SLAC.Stanford.EDU
Sat, 11 May 2002 11:35:45 -0700


--Boundary_(ID_zXq3eNO7YhTejwbpiX6J/w)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline


Hello,

We have a problem that _seems_ to be AFS related, although the oops
points to an EXT3 function.  The problem happens both on openafs 1.2.3
and release canidate 2, 1.2.4, but not a local filesystem directory,
nor an NFS mount point. The crash also happens if the machine that the AFS
space resides is running Transarc AFS or OpenAFS under Linux.

The test method is to run bonnie++ on a directory in AFS space. 

After doing a write that is large (>= 1 gig) in AFS space, when bonnie 
calls an unlink to the created file, there is an oops, and the
system wedges. While doing a strace on the process I get this:

temp11 kernel: Assertion failure in journal_write_metadata_buffer() at
journal.c
:406: "buffer_jdirty(jh2bh(jh_in))"

The cache space is at most 60% used, the disk has plenty of space left, 
and the network to the server side is fine. There is a simple log file
attached to this mail.

There was a similar problem the reported on the LKML, at least at the
functional level <http://linux.msede.com/lvm_mlist/archive/2000/11/0145.html>.

Attached is the oops, a messy ksymoops of the oops, an strace
on the running process, and a snapshot of the filesystem state just
before the crash.

Thanks for any help you can give.

John

##############################################
# John Goebel <jgoebel(at)slac.stanford.edu> #
# Stanford Linear Accelerator Center         #
# 2575 Sand Hill Road, Menlo Park, CA 94025  #
############################################ #

--Boundary_(ID_zXq3eNO7YhTejwbpiX6J/w)
Content-type: text/plain; charset=us-ascii; NAME=report
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=report


Message from syslogd@temp11 at Fri May 10 14:39:35 2002 ...
temp11 kernel: Assertion failure in journal_write_metadata_buffer() at journal.c:406: "buffer_jdirty(jh2bh(jh_in))"



temp11 login: Assertion failure in journal_write_metadata_buffer() at journal.c:406: "buffer_jdirty(jh2bh(jh_in))"
invalid operand: 0000
nfs lockd sunrpc libafs-2.4.18-4smp.mp e100 ext3 jbd sym53c8xx sd_mod scsi_mod
CPU:    0
EIP:    0010:[<f884f9b4>]    Tainted: PF
EFLAGS: 00010282

EIP is at journal_write_metadata_buffer [jbd] 0x74 (2.4.18-4smp)
eax: 0000001d   ebx: 00000000   ecx: c02eeec0   edx: 00003552
esi: 00000000   edi: f6a507e0   ebp: d3583400   esp: f6c75e44
ds: 0018   es: 0018   ss: 0018
Process kjournald (pid: 145, stackpage=f6c75000)
Stack: f8854181 00000196 000015f4 f6b06c00 00000000 00000000 f7fe3520 00000000
	   f6a507e0 d3583400 f884cdf4 f6a507e0 f7fe3520 f6c75e98 000017f0 50195cfc
       00000000 00000fd4 e4d3f02c 00000003 f6a507e0 db272cd0 000017f0 c010de61
Call Trace: [<f8854181>] .rodata.str1.1 [jbd] 0x4e1
[<f884cdf4>] journal_commit_transaction [jbd] 0x7e4
[<c010de61>] timer_interrupt [kernel] 0xb1
[<f884f806>] kjournald [jbd] 0x136
[<f884f6b0>] commit_timeout [jbd] 0x0
[<c0107286>] kernel_thread [kernel] 0x26
[<f884f6d0>] kjournald [jbd] 0x0


Code: 0f 0b 5e 5f 8b 7c 24 28 8b 4f 0c 85 c9 74 2e c7 44 24 0c 01


--Boundary_(ID_zXq3eNO7YhTejwbpiX6J/w)
Content-type: text/plain; charset=us-ascii; NAME=report.oops
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=report.oops

ksymoops 2.4.1 on i686 2.4.18-4smp.  Options used
     -V (default)
     -k /proc/ksyms (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.18-4smp/ (default)
     -m /boot/System.map-2.4.18-4smp (default)

Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
Error (expand_objects): cannot stat(/lib/sym53c8xx.o) for sym53c8xx
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says c01cdad0, System.map says c0168570.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol nlmsvc_ops  , lockd says f8d69cd0, /lib/modules/2.4.18-4smp/kernel/fs/lockd/lockd.o says f8d69134.  Ignoring /lib/modules/2.4.18-4smp/kernel/fs/lockd/lockd.o entry
Warning (compare_maps): mismatch on symbol nfs_debug  , sunrpc says f8d5af04, /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o says f8d5abe4.  Ignoring /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nfsd_debug  , sunrpc says f8d5af08, /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o says f8d5abe8.  Ignoring /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nlm_debug  , sunrpc says f8d5af0c, /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o says f8d5abec.  Ignoring /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_debug  , sunrpc says f8d5af00, /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o says f8d5abe0.  Ignoring /lib/modules/2.4.18-4smp/kernel/net/sunrpc/sunrpc.o entry
Warning (map_ksym_to_module): cannot match loaded module ext3 to a unique module object.  Trace may not be reliable.
Warning (compare_maps): mismatch on symbol sd  , sd_mod says f882d240, /lib/modules/2.4.18-4smp/kernel/drivers/scsi/sd_mod.o says f882d1a0.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/sd_mod.o entry
Warning (compare_maps): scsi_mod symbol GPLONLY_scsi_reset_provider not found in /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol proc_scsi  , scsi_mod says f8828654, /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o says f8826d88.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_devicelist  , scsi_mod says f8828680, /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o says f8826db4.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hostlist  , scsi_mod says f882867c, /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o says f8826db0.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hosts  , scsi_mod says f8828684, /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o says f8826db8.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_logging_level  , scsi_mod says f8828650, /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o says f8826d84.  Ignoring /lib/modules/2.4.18-4smp/kernel/drivers/scsi/scsi_mod.o entry
invalid operand: 0000
CPU:    0
EIP:    0010:[<f884f9b4>]    Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 0000001d   ebx: 00000000   ecx: c02eeec0   edx: 00003552
esi: 00000000   edi: f6a507e0   ebp: d3583400   esp: f6c75e44
ds: 0018   es: 0018   ss: 0018
Process kjournald (pid: 145, stackpage=f6c75000)
Stack: f8854181 00000196 000015f4 f6b06c00 00000000 00000000 f7fe3520 00000000
           f6a507e0 d3583400 f884cdf4 f6a507e0 f7fe3520 f6c75e98 000017f0 50195cfc
       00000000 00000fd4 e4d3f02c 00000003 f6a507e0 db272cd0 000017f0 c010de61
Call Trace: [<f8854181>] .rodata.str1.1 [jbd] 0x4e1
[<f884cdf4>] journal_commit_transaction [jbd] 0x7e4
[<c010de61>] timer_interrupt [kernel] 0xb1
[<f884f806>] kjournald [jbd] 0x136
[<f884f6b0>] commit_timeout [jbd] 0x0
[<c0107286>] kernel_thread [kernel] 0x26
[<f884f6d0>] kjournald [jbd] 0x0
Code: 0f 0b 5e 5f 8b 7c 24 28 8b 4f 0c 85 c9 74 2e c7 44 24 0c 01

>>EIP; f884f9b4 <[jbd]journal_write_metadata_buffer+74/350>   <=====
Trace; f8854181 <[jbd].rodata.end+29c2/59b9>
Trace; f884cdf4 <[jbd]journal_commit_transaction+7e4/11e3>
Trace; c010de61 <timer_interrupt+b1/140>
Trace; f884f806 <[jbd]kjournald+136/1d0>
Trace; f884f6b0 <[jbd]commit_timeout+0/10>
Trace; c0107286 <kernel_thread+26/30>
Trace; f884f6d0 <[jbd]kjournald+0/1d0>
Code;  f884f9b4 <[jbd]journal_write_metadata_buffer+74/350>
00000000 <_EIP>:
Code;  f884f9b4 <[jbd]journal_write_metadata_buffer+74/350>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  f884f9b6 <[jbd]journal_write_metadata_buffer+76/350>
   2:   5e                        pop    %esi
Code;  f884f9b7 <[jbd]journal_write_metadata_buffer+77/350>
   3:   5f                        pop    %edi
Code;  f884f9b8 <[jbd]journal_write_metadata_buffer+78/350>
   4:   8b 7c 24 28               mov    0x28(%esp,1),%edi
Code;  f884f9bc <[jbd]journal_write_metadata_buffer+7c/350>
   8:   8b 4f 0c                  mov    0xc(%edi),%ecx
Code;  f884f9bf <[jbd]journal_write_metadata_buffer+7f/350>
   b:   85 c9                     test   %ecx,%ecx
Code;  f884f9c1 <[jbd]journal_write_metadata_buffer+81/350>
   d:   74 2e                     je     3d <_EIP+0x3d> f884f9f1 <[jbd]journal_write_metadata_buffer+b1/350>
Code;  f884f9c3 <[jbd]journal_write_metadata_buffer+83/350>
   f:   c7 44 24 0c 01 00 00      movl   $0x1,0xc(%esp,1)
Code;  f884f9ca <[jbd]journal_write_metadata_buffer+8a/350>
  16:   00 


14 warnings and 5 errors issued.  Results may not be reliable.

--Boundary_(ID_zXq3eNO7YhTejwbpiX6J/w)
Content-type: text/plain; charset=us-ascii; NAME=disk_state.log
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=disk_state.log

Fri May 10 14:39:23 PDT 2002
################################################################################Volume Name                   Quota      Used %Used   Partition
ext2                        5000000   1796177   36%          3%  
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda1              2063504   1803376    155308  93% /
none                    515524         0    515524   0% /dev/shm
/dev/hda6              1035660    605072    377980  62% /scswork
/dev/hda8              2601116    644916   1824068  27% /tmp
/dev/hda7                23333      1043     21086   5% /mnt/t
/dev/hda5              1035660    753456    229596  77% /usr/vice/cache
/dev/hda3              1035692    137108    845972  14% /var
AFS                    9000000         0   9000000   0% /afs
total 4
-rwxr-xr-x    1 root     root          398 May  8 12:20 monitor
Available sets:
cmlongterm inactive
cm inactive (dormant)
cmfx : 60 kbytes (allocated)

--Boundary_(ID_zXq3eNO7YhTejwbpiX6J/w)--