[OpenAFS] 2.4.x afsd shutdown oops when processes still accessing /afs
Rudolph T Maceyko
rtm@cert.org
Tue, 09 Oct 2001 15:39:42 -0400
--==========809195387==========
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
When processes are still accessing afs (e.g., cwd is /afs/...) at the
time a shutdown is occurring things get messy.
The afs shutdown fails to umount /afs (EBUSY) and fails to unload the
module (EBUSY). Then several more umount attempts fail (still EBUSY),
the process gets killed and an oops results for any subsequent umount
/afs. (See attached.)
I've noticed this under Red Hat 7.1 with Red Hat's "stock" 2.4.[367]
kernels and OpenAFS 1.2.1. I have not tried other distributions or
kernels. I don't recall the "oops" behavior happening under these
circumstances with OpenAFS prior to 1.2.1.
Ordinarily the problem happens at shutdown time sufficiently late that
klogd isn't still running so there's nothing to run ksymoops on, so
I've reproduced the problem artificially.
(Make sure a process has cwd of /afs/...)
# /etc/init.d/afs stop
Stopping AFS services.....
inetd.afs: no process killed
umount: /afs: device is busy
libafs-2.4.7-2-i686: Device or resource busy
(kill the process whose cwd was /afs/...)
# umount /afs
(watch oops as it occurs)
When this happens at shutdown a power-cycle is required.
I wish I had the expertise to track this down further myself, but alas
I can't do that without guidance. The problem is about 90%
reproducible as shown above, and decreasing as I continue to exersize
the problem... I don't know why sometimes I don't get an oops.
When doing it by hand, make sure to umount /afs at the end and rmmod
libafs or the box will hang unmounting filesystems when you shutdown.
Rudy
--==========809195387==========
Content-Type: text/plain; charset=iso-8859-1; name="ksymoops.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="ksymoops.txt"; size=6681
ksymoops 2.4.0 on i686 2.4.7-2. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.7-2/ (default)
-m /boot/System.map-2.4.7-2 (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Warning (compare_maps): ksyms_base symbol =
__VERSIONED_SYMBOL(ataraid_gendisk) not found in System.map. Ignoring =
ksyms_base entry
Warning (compare_maps): ksyms_base symbol =
__VERSIONED_SYMBOL(ataraid_get_device) not found in System.map. =
Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol =
__VERSIONED_SYMBOL(ataraid_register_disk) not found in System.map. =
Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol =
__VERSIONED_SYMBOL(ataraid_release_device) not found in System.map. =
Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol partition_name , ksyms_base =
says c01b7600, System.map says c0157550. Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol usb_devfs_handle , usbcore =
says e106f5a0, /lib/modules/2.4.7-2/kernel/drivers/usb/usbcore.o says =
e106f0c0. Ignoring /lib/modules/2.4.7-2/kernel/drivers/usb/usbcore.o =
entry
Oct 8 10:22:38 gti kernel: 3c59x: Donald Becker and others. =
www.scyld.com/network/vortex.html
Oct 8 10:41:51 gti kernel: 3c59x: Donald Becker and others. =
www.scyld.com/network/vortex.html
Oct 8 15:19:40 gti kernel: ac97_codec: AC97 Audio codec, id: =
0x8384:0x7609 (SigmaTel STAC9721/23)
Oct 8 15:46:19 gti kernel: ac97_codec: AC97 Audio codec, id: =
0x8384:0x7609 (SigmaTel STAC9721/23)
Oct 8 17:55:05 gti kernel: 3c59x: Donald Becker and others. =
www.scyld.com/network/vortex.html
Oct 8 17:58:18 gti kernel: 3c59x: Donald Becker and others. =
www.scyld.com/network/vortex.html
Oct 8 18:59:03 gti kernel: 3c59x: Donald Becker and others. =
www.scyld.com/network/vortex.html
Oct 8 21:31:26 gti kernel: ac97_codec: AC97 Audio codec, id: =
0x8384:0x7609 (SigmaTel STAC9721/23)
Oct 8 23:48:39 gti kernel: ac97_codec: AC97 Audio codec, id: =
0x8384:0x7609 (SigmaTel STAC9721/23)
Oct 9 10:15:22 gti kernel: 3c59x: Donald Becker and others. =
www.scyld.com/network/vortex.html
Oct 9 10:41:06 gti kernel: kernel BUG at slab.c:1570!
Oct 9 10:41:06 gti kernel: invalid operand: 0000
Oct 9 10:41:06 gti kernel: CPU: 0
Oct 9 10:41:06 gti kernel: EIP: 0010:[kmem_cache_free+87/704]
Oct 9 10:41:06 gti kernel: EIP: 0010:[<c012dca7>]
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 9 10:41:06 gti kernel: EFLAGS: 00010282
Oct 9 10:41:06 gti kernel: eax: 0000001b ebx: e11c1fe0 ecx: =
00000001 edx: 00001f86
Oct 9 10:41:06 gti kernel: esi: 008cb744 edi: e11c1fe0 ebp: =
00000001 esp: dc79feb4
Oct 9 10:41:06 gti kernel: ds: 0018 es: 0018 ss: 0018
Oct 9 10:41:06 gti kernel: Process umount (pid: 1296, =
stackpage=3Ddc79f000)
Oct 9 10:41:06 gti kernel: Stack: c021b79c 00000622 c1b10134 c01264b5 =
e11c1fe0 e11c1fe0 e11c1fe0 00000001=20
Oct 9 10:41:06 gti kernel: c014a2fa c1b8832c e11c1fe0 e11c1fe0 =
c014b96b e11c1fe0 dd4d955c 00000246=20
Oct 9 10:41:06 gti kernel: df5c4394 df5bde04 df1cd3ac df1cd394 =
c01495c3 e11c1fe0 e11c1fe0 c0141690=20
Oct 9 10:41:06 gti kernel: Call Trace: [copyrite+16636/36756] =
[handle_mm_fault+101/224] [<e11c1fe0>] [<e11c1fe0>] [<e11c1fe0>]=20
Oct 9 10:41:06 gti kernel: Call Trace: [<c021b79c>] [<c01264b5>] =
[<e11c1fe0>] [<e11c1fe0>] [<e11c1fe0>]=20
Oct 9 10:41:06 gti kernel: [<c014a2fa>] [<e11c1fe0>] [<e11c1fe0>] =
[<c014b96b>] [<e11c1fe0>] [<c01495c3>]=20
Oct 9 10:41:06 gti kernel: [<e11c1fe0>] [<e11c1fe0>] =
[cached_lookup+16/80] [<e113fd60>] [<e113fd2c>] =
[shrink_dcache_parent+22/48]=20
Oct 9 10:41:06 gti kernel: [<e11c1fe0>] [<e11c1fe0>] [<c0141690>] =
[<e113fd60>] [<e113fd2c>] [<c01498c6>]=20
Oct 9 10:41:06 gti kernel: [<c013caea>] [<c0141678>] [<c013cf7a>] =
[<c0136f23>] [<c01275d2>] [<c013cfbc>]=20
Oct 9 10:41:06 gti kernel: [<c0106f0b>]=20
Oct 9 10:41:06 gti kernel: Code: 0f 0b 58 5a 8b 15 4c 14 2d c0 eb 0d =
90 90 90 90 90 90 90 90=20
>>EIP; c012dca7 <kmem_cache_free+57/2c0> <=3D=3D=3D=3D=3D
Trace; c021b79c <copyrite+40fc/8f94>
Trace; c01264b5 <handle_mm_fault+65/e0>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; c014a2fa <destroy_inode+2a/30>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; c014b96b <iput+19b/1b0>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; c01495c3 <prune_dcache+d3/160>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; e11c1fe0 <END_OF_CODE+73321/????>
Trace; c0141690 <cached_lookup+10/50>
Trace; e113fd60 <[libafs-2.4.7-2-i686]rcsid+40/120>
Trace; e113fd2c <[libafs-2.4.7-2-i686]rcsid+c/120>
Trace; c01498c6 <shrink_dcache_parent+16/30>
Trace; c013caea <kill_super+4a/100>
Trace; c0141678 <path_release+28/30>
Trace; c013cf7a <sys_umount+da/110>
Trace; c0136f23 <filp_close+53/60>
Trace; c01275d2 <sys_munmap+32/50>
Trace; c013cfbc <sys_oldumount+c/10>
Trace; c0106f0b <system_call+33/38>
Code; c012dca7 <kmem_cache_free+57/2c0>
00000000 <_EIP>:
Code; c012dca7 <kmem_cache_free+57/2c0> <=3D=3D=3D=3D=3D
0: 0f 0b ud2a <=3D=3D=3D=3D=3D
Code; c012dca9 <kmem_cache_free+59/2c0>
2: 58 pop %eax
Code; c012dcaa <kmem_cache_free+5a/2c0>
3: 5a pop %edx
Code; c012dcab <kmem_cache_free+5b/2c0>
4: 8b 15 4c 14 2d c0 mov 0xc02d144c,%edx
Code; c012dcb1 <kmem_cache_free+61/2c0>
a: eb 0d jmp 19 <_EIP+0x19> c012dcc0 =
<kmem_cache_free+70/2c0>
Code; c012dcb3 <kmem_cache_free+63/2c0>
c: 90 nop =20
Code; c012dcb4 <kmem_cache_free+64/2c0>
d: 90 nop =20
Code; c012dcb5 <kmem_cache_free+65/2c0>
e: 90 nop =20
Code; c012dcb6 <kmem_cache_free+66/2c0>
f: 90 nop =20
Code; c012dcb7 <kmem_cache_free+67/2c0>
10: 90 nop =20
Code; c012dcb8 <kmem_cache_free+68/2c0>
11: 90 nop =20
Code; c012dcb9 <kmem_cache_free+69/2c0>
12: 90 nop =20
Code; c012dcba <kmem_cache_free+6a/2c0>
13: 90 nop =20
7 warnings issued. Results may not be reliable.
--==========809195387==========--