[OpenAFS] 1.4.4 client on EL3: panic in afs_HashOutDcache
Stephan Wiesand
Stephan.Wiesand@desy.de
Thu, 12 Apr 2007 09:30:59 +0200 (CEST)
On Wed, 11 Apr 2007, Derrick J Brashear wrote:
> On Wed, 11 Apr 2007, Stephan Wiesand wrote:
>
>> One of our systems panicked two times within 2 hours yesterday, at the same
>> location in the OpenAFS client. I attached the kernel's last words below.
>>
>> This is an SL3 system, kernel 2.4.21-47.0.1.ELsmp, i686. The client build
>> has two patches on top of 1.4.4: linux-task-pointer-safety-20070320 from
>> CVS, and the one from
>> https://lists.openafs.org/pipermail/openafs-devel/2007-March/014985.html
>
> afs_HashOutDCache has
> /* if this guy is in the hash table, pull him out */
> if (adc->f.fid.Fid.Volume != 0) {
> i = DCHash(&adc->f.fid, adc->f.chunk);
> us = afs_dchashTbl[i];
> if (us == adc->index) {
> ..
> } else {
> /* somewhere on the chain */
> while (us != NULLIDX) {
> if (afs_dcnextTbl[us] == adc->index) {
> /* found item pointing at the one to delete */
> afs_dcnextTbl[us] = afs_dcnextTbl[adc->index];
> break;
> }
> us = afs_dcnextTbl[us];
> }
> if (us == NULLIDX)
> osi_Panic("dcache hc");
>
> so basically you appear to have an unhashed dcache entry. Either there's a
> locking bug or something is becoming erroneously unhashed.
>
> How reproducible is it?
Good news: it is reproducible. The user confessed that he'd run "less than
20" parallel rsyncs transferring data to our cell. The files are a mixture
af data and log files, with typical sizes of 15MB and 100kB.
So I set up a dozen rsyncs to copy this data into another volume, and
after some 9 hours got the panic you find below.
I'm going to repeat this exercise now, and will also try to make the panic
happen earlier (more rsyncs, read data from a faster source - any other
ideas?).
Just wondering what to do next then.
Thanks for caring,
Stephan
PS Here's the Oops:
dcache hc<1>Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
f8a6da50
*pde = 34669001
*pte = 5b103067
Oops: 0002
panfs nfs lockd sunrpc openafs netconsole 3c59x mii microcode ohci1394 ieee1394 loop keybdev mousedev hid input usb-uhci usbcore ext3 jbd lvm-mod aic7xxx disk
CPU: 2
EIP: 0060:[<f8a6da50>] Tainted: P
EFLAGS: 00010282
EIP is at osi_Panic [openafs] 0x20 (2.4.21-47.0.1.ELsmp/i686)
eax: 00000009 ebx: f8b74000 ecx: 00000046 edx: c0388e98
esi: f8c328c0 edi: 0015fa73 ebp: 0000000d esp: f5427e04
ds: 0068 es: 0068 ss: 0068
Process afs_cachetrim (pid: 987, stackpage=f5427000)
Stack: f8a9365b 00000002 00000000 f8a46e77 f8c328c0 0015fa73 0000000d f8a2d9ef
f8a9365b 00000002 00000000 f8a46e77 f8c328c0 d4938380 0015fa73 f8a2d6a8
f8c328c0 00000000 00000000 0000f2da d0928990 00000000 00000000 4dd6d295
Call Trace: [<f8a9365b>] .rodata.str1.1 [openafs] 0x11f (0xf5427e04)
[<f8a46e77>] shutdown_vcache [openafs] 0x357 (0xf5427e10)
[<f8a2d9ef>] afs_HashOutDCache [openafs] 0x7f (0xf5427e20)
[<f8a9365b>] .rodata.str1.1 [openafs] 0x11f (0xf5427e24)
[<f8a46e77>] shutdown_vcache [openafs] 0x357 (0xf5427e30)
[<f8a2d6a8>] afs_GetDownD [openafs] 0x528 (0xf5427e40)
[<f8a2cd2e>] afs_CacheTruncateDaemon [openafs] 0x12e (0xf5427fa0)
[<f8a7f9f0>] afsd_thread [openafs] 0x3e0 (0xf5427fe0)
[<f8a7f610>] afsd_thread [openafs] 0x0 (0xf5427fe4)
[<c01095cd>] kernel_thread_helper [kernel] 0x5 (0xf5427ff0)
Code: c6 05 00 00 00 00 00 83 c4 1c c3 90 8d 74 26 00 b8 4f 42 a9
Kernel panic: Fatal exception
--
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany