[OpenAFS-devel] [PATCH] fix openafs crashes on linux 2.6.10-2.6.12, and all RHEL4 kernels

Christopher Allen Wing wingc@engin.umich.edu
Wed, 18 Apr 2007 14:45:36 -0400 (EDT)


Hello,


On Wed, 18 Apr 2007, Chaskiel M Grundman wrote:

> While your change (make osi_Alloc not run under the GLOCK) is completely 
> legitimate, your findings indicate a problem with the linux_alloc 
> implementation. I would suggest the following also be done (not in the 
> link-fix patch):
>
> in the vmalloc branch of LINUX/osi_alloc.c:linux_alloc, the code should 
> assert if (!drop_glock && haveGlock) and drop the glock around the vmalloc 
> call if (drop_glock && haveGlock)
>
>       } else {
> +         osi_Assert(drop_glock || !haveGlock);
> +         if (drop_glock && haveGlock)
> +               AFS_GUNLOCK();
>           new = (void *)vmalloc(asize);
> +           if (drop_glock && haveGlock)
> +               AFS_GLOCK();
>           if (new)            /* piggy back alloc type */
>               new = (void *)(VM_TYPE | (unsigned long)new);
>       }


Note that you can pass GFP_NOFS to vmalloc, if you use the __vmalloc() 
function instead.  This is also an exported symbol in Linux:


(untested)

--- openafs/src/afs/LINUX/osi_alloc.c.orig	2004-12-07 01:12:12.000000000 -0500
+++ openafs/src/afs/LINUX/osi_alloc.c	2007-04-18 14:38:34.000000000 -0400
@@ -57,7 +57,7 @@
  /* externs : can we do this in a better way. Including vmalloc.h causes other
   * problems.*/
  extern void vfree(void *addr);
-extern void *vmalloc(unsigned long size);
+extern void *__vmalloc(unsigned long size, int gfp_mask, pgprot_t prot);
  #endif

  /* Allocator support functions (static) */
@@ -98,7 +98,9 @@
  	    if (new)		/* piggy back alloc type */
  		new = (void *)(KM_TYPE | (unsigned long)new);
  	} else {
-	    new = (void *)vmalloc(asize);
+	    new = (void *)__vmalloc(asize,
+				    GFP_KERNEL | __GFP_HIGHMEM | GFP_NOFS,
+				    PAGE_KERNEL);
  	    if (new)		/* piggy back alloc type */
  		new = (void *)(VM_TYPE | (unsigned long)new);
  	}




I don't understand the usage of AFS_GLOCK(); why is it safe to drop and 
re-acquire it in cases like this?  Is AFS_GLOCK intended to serialize all 
AFS calls coming from vnode methods in the kernel?

Can anyone (briefly) enlighten me?


Thanks,

Chris Wing
wingc@engin.umich.edu