[OpenAFS-devel] [PATCH] fix openafs crashes on linux 2.6.10-2.6.12, and all RHEL4 kernels

Chaskiel M Grundman cg2v@andrew.cmu.edu
Wed, 18 Apr 2007 13:32:01 -0400


--On Wednesday, April 18, 2007 11:07:45 AM -0400 Christopher Allen Wing 
<wingc@engin.umich.edu> wrote:

> GFP_NOFS tells the allocator not to recurse back into the filesystem if
> it's necessary to free up memory.  However, vmalloc() does not have such
> an option.  Therefore, calling osi_Alloc() to request more than a page of
> memory may end up recursing back into AFS to try to free unused inodes or
> dentries.
>
> In this case, what happened was that osi_Alloc() is called within an
> AFS_GLOCK(); osi_Alloc() calls vmalloc() which tries to free dentry
> objects, which then calls back into the AFS module.  Unfortunately,
> AFS_GLOCK() is already held and we deadlock.

While your change (make osi_Alloc not run under the GLOCK) is completely 
legitimate, your findings indicate a problem with the linux_alloc 
implementation. I would suggest the following also be done (not in the 
link-fix patch):

in the vmalloc branch of LINUX/osi_alloc.c:linux_alloc, the code should 
assert if (!drop_glock && haveGlock) and drop the glock around the vmalloc 
call if (drop_glock && haveGlock)

        } else {
+         osi_Assert(drop_glock || !haveGlock);
+         if (drop_glock && haveGlock)
+               AFS_GUNLOCK();
            new = (void *)vmalloc(asize);
+           if (drop_glock && haveGlock)
+               AFS_GLOCK();
            if (new)            /* piggy back alloc type */
                new = (void *)(VM_TYPE | (unsigned long)new);
        }

This change will not affect the current caller that sets drop_glock to 0, 
since sizeof(afs_event_t) is nowhere near the PAGE_SIZE limit.