[OpenAFS] 1.4.4 client on EL3: panic in afs_HashOutDcache

Stephan Wiesand Stephan.Wiesand@desy.de
Wed, 11 Apr 2007 11:45:36 +0200 (CEST)


On Wed, 11 Apr 2007, Derrick J Brashear wrote:

> On Wed, 11 Apr 2007, Stephan Wiesand wrote:
>
>> One of our systems panicked two times within 2 hours yesterday, at the same 
>> location in the OpenAFS client. I attached the kernel's last words below.
>> 
>> This is an SL3 system, kernel 2.4.21-47.0.1.ELsmp, i686. The client build 
>> has two patches on top of 1.4.4: linux-task-pointer-safety-20070320 from 
>> CVS, and the one from
>> https://lists.openafs.org/pipermail/openafs-devel/2007-March/014985.html
>
> afs_HashOutDCache has
>    /* if this guy is in the hash table, pull him out */
>    if (adc->f.fid.Fid.Volume != 0) {
>        i = DCHash(&adc->f.fid, adc->f.chunk);
>        us = afs_dchashTbl[i];
>        if (us == adc->index) {
> ..
>       } else {
>            /* somewhere on the chain */
>            while (us != NULLIDX) {
>                if (afs_dcnextTbl[us] == adc->index) {
>                    /* found item pointing at the one to delete */
>                    afs_dcnextTbl[us] = afs_dcnextTbl[adc->index];
>                    break;
>                }
>                us = afs_dcnextTbl[us];
>            }
>            if (us == NULLIDX)
>                osi_Panic("dcache hc");
>
> so basically you appear to have an unhashed dcache entry. Either there's a 
> locking bug or something is becoming erroneously unhashed.
>
> How reproducible is it?

Not easily. I tried to apply some cache pressure by reading several large 
files at the same time, but no luck yet. I'll try to get my suspect to 
admit what he actually did.


-- 
Stephan Wiesand
    DESY - DV -
    Platanenallee 6
    15738 Zeuthen, Germany