[OpenAFS] 1.3.85 Still Crashing w/ Fedora 3 (Linux 2.6.11)

Jason McCormick jasonmc@cert.org
Mon, 18 Jul 2005 18:24:35 -0400


--On Wednesday, July 13, 2005 11:56:51 AM -0400 Derrick J Brashear
<shadow@dementia.org> wrote:

 
>> have a box hooked up to a serial port console with a logger to try catch
>> another.
> 
> Thank you.

Unfortunately the oops is not helpful because it doesn't have time to print
before the kernel blows up. It prints the first hex address of what looks
like a memory location and then dies hard.  Looks like:

[<c01bd35c>]

 I'm nursing a theory that the bug only triggers in combination with
VMware.  I tried for a few days to crash one of the the test servers in our
machine room already on the console without success but it only took about
30 minutes of random file actions to cause the workstation to crash -- with
VMware running.

 There's also a filesystem unmount bug that I've seen sporadically (about
50% of the time), sometimes with a message of:

Unmounting file systems:  Failed to invalidate all pages on inode 0xf5f85800

This is sometimes (but not always) accompanied by an oops. That oops is:

Stopping AFS services.....
Failed to invalidate all pages on inode 0xf652f580
WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent...
Unmas
kRxkSignals... RxListener...
VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice
day..
.
slab error in kmem_cache_destroy(): cache `afs_inode_cache': Can't free all
objects
[<c0146a91>] kmem_cache_destroy+0xdc/0x132
[<f94b4ec8>] afs_destroy_inodecache+0xd/0x25 [libafs]
[<f94c6c19>] cleanup_module+0x19/0x25 [libafs]
[<c0136a25>] sys_delete_module+0x148/0x166
[<c0151080>] unmap_vma_list+0xe/0x17
[<c01513e1>] do_munmap+0xff/0x143
[<c0103f0f>] syscall_call+0x7/0xb

 I've seen this on machines that aren't running VMware as well so it's a
different issue.  After the above error, the machine locks and has to be
power-cycled.  

 I'd say the first error (possible combination with VMware) shouldn't hold
up rc1.  However the second error is probably more serious at this point.
All tests are done on Fedora 3 hosts with 2.6.11-1.35_FC3smp and non-smp
kernels.

-- Jason