[OpenAFS] 1.4.4 client on EL3: panic in afs_HashOutDcache

Derrick J Brashear shadow@dementia.org
Wed, 11 Apr 2007 04:56:59 -0400 (EDT)


On Wed, 11 Apr 2007, Stephan Wiesand wrote:

> One of our systems panicked two times within 2 hours yesterday, at the same 
> location in the OpenAFS client. I attached the kernel's last words below.
>
> This is an SL3 system, kernel 2.4.21-47.0.1.ELsmp, i686. The client build has 
> two patches on top of 1.4.4: linux-task-pointer-safety-20070320 from CVS, and 
> the one from
> https://lists.openafs.org/pipermail/openafs-devel/2007-March/014985.html

afs_HashOutDCache has
     /* if this guy is in the hash table, pull him out */
     if (adc->f.fid.Fid.Volume != 0) {
         i = DCHash(&adc->f.fid, adc->f.chunk);
         us = afs_dchashTbl[i];
         if (us == adc->index) {
..
        } else {
             /* somewhere on the chain */
             while (us != NULLIDX) {
                 if (afs_dcnextTbl[us] == adc->index) {
                     /* found item pointing at the one to delete */
                     afs_dcnextTbl[us] = afs_dcnextTbl[adc->index];
                     break;
                 }
                 us = afs_dcnextTbl[us];
             }
             if (us == NULLIDX)
                 osi_Panic("dcache hc");

so basically you appear to have an unhashed dcache entry. Either there's a 
locking bug or something is becoming erroneously unhashed.

How reproducible is it?

> dcache hc<1>Unable to handle kernel NULL pointer dereference at virtual 
> address 00000000
> printing eip: f8a6da50 *pde = 13ad0001 *pte = 00000000 Oops: 0002 panfs nfs 
> lockd sunrpc openafs netconsole 3c59x mii microcode ohci1394 ieee1394 loop 
> keybdev mousedev hid input usb-uhci usbcore ext3 jbd lvm-mod aic7xxx disk 
> CPU:    3 EIP:    0060:[<f8a6da50>]    Tainted: P EFLAGS: 00210282
>
> EIP is at osi_Panic [openafs] 0x20 (2.4.21-47.0.1.ELsmp/i686) eax: 00000009 
> ebx: f8b74000   ecx: 00200046   edx: c0388e98 esi: f8c43080   edi: 00027b31 
> ebp: 00000002   esp: f2a39e04 ds: 0068   es: 0068   ss: 0068 Process 
> afs_cachetrim (pid: 980, stackpage=f2a39000) Stack: f8a9365b 00000001 
> 00000000 f8c43080 f8c43080 00027b31 00000002 f8a2d9ef
>       f8a9365b 00000001 00000000 f8c43080 f8c43080 ed689680 00027b31 
> f8a2d6a8
>       f8c43080 00000000 00000000 00000937 f2a39e94 c0123410 00000000 
> 116c94c6 Call Trace:   [<f8a9365b>] .rodata.str1.1 [openafs] 0x11f 
> (0xf2a39e04) [<f8a2d9ef>] afs_HashOutDCache [openafs] 0x7f (0xf2a39e20) 
> [<f8a9365b>] .rodata.str1.1 [openafs] 0x11f (0xf2a39e24) [<f8a2d6a8>] 
> afs_GetDownD [openafs] 0x528 (0xf2a39e40) [<c0123410>] load_balance [kernel] 
> 0x30 (0xf2a39e58) [<f8a2cd2e>] afs_CacheTruncateDaemon [openafs] 0x12e 
> (0xf2a39fa0) [<f8a7f9f0>] afsd_thread [openafs] 0x3e0 (0xf2a39fe0) 
> [<f8a7f610>] afsd_thread [openafs] 0x0 (0xf2a39fe4) [<c01095cd>] 
> kernel_thread_helper [kernel] 0x5 (0xf2a39ff0)