[OpenAFS-devel] Re: [openMosix-general] OpenAFS + openMosix

Moshe Bar moshe@Moelabs.com
Wed, 4 Dec 2002 12:54:57 +0200 (IST)


Great analysis and problem resoultion! 

thanks

Moshe

On Tue, 3 Dec 2002, Onime Clement 
wrote:

> 
> Subject: OpenAFS + OpenMosix
> 
> I seem to have discovered a bug in the OpenAFS kernel module code that
> causes linux to fail when using the OpenMosix kernel.
> This problem was reported earlier this year with previous openafs and
> openmosix versions. 
> 
> OpenAFS version 1.2.7 
> OpenMosix kernel version 2.4.18-openmosix3
> RedHat Linux 7.3
> 
> Symptoms:
> =========
> When running on an openmosix kernel. Some applications
> that tries to close files that were opened using the mmap call runs into 
> problems. The application hangs, it is impossible to kill the application,
> impossible to run top, ps  or kill or any other proc base utility. 
> Although the computer is still usable, it is also impossible to halt or 
> reboot the machine via software. Only a hard reset works or power off works.
> 
> How to reproduce:
> =================
> It is possible to reproduce the problem by trying to run netscape or staroffice
> 5.2 when your home directory is in AFS, on a machine running an OpenMosix 
> kernel. 
> 
> Analysis:
> =========
> I traced the code to the routine afs_linux_vma_close in the osi_vnodeops.c file
> And particularly the afs_close call
>  Preliminary tracing using printk indicates that the sometimes calls to
>  afs_close in the function does not return.
> Specifically when afs_close is called with vcp->execsOrWriters > 0 , it blocks
> which causes the hanging.
> 
> patch:
> ======
> A simple patch to osi_vnodeops.c to fix this problem is included below and
> attached to this mail.
> The patch needs to be applied from the src directory or with the right patch
> -p option.
> 
> 
> Thanks
> Clement Onime
> System and Network Analyst
> The Abdus Salam ICTP
> Trieste, Italy
> 
> <--------------------- cut here --------------------->
> 
> --- afs/osi_vnodeops.c.orig	Thu Aug  1 21:12:01 2002
> +++ afs/osi_vnodeops.c	Tue Dec  3 09:43:21 2002
> @@ -342,12 +342,26 @@
>      if (vcp->mapcnt) {
>  	vcp->mapcnt--;
> +	if (vcp->mapcnt == 0) {
> +		/* This bit is supposed to undo what was
> +		 * done on lines */
> +	   if (vcp->execsOrWriters > 0)
> +	    	vcp->execsOrWriters--;
> +	    if (vcp->opens > 0)
> +	    	vcp->opens--;
> +	    /* vcp->states &= ~CMAPPED; */
> +	}
>  	ReleaseWriteLock(&vcp->lock);
>  	if (!vcp->mapcnt) {
>  	    credp = crref();
> -	    (void) afs_close(vcp, vmap->vm_file->f_flags, credp);
> +	/* 	printk("AFSMM: afs_close: Mapcnt=%d Opens=%d execsOrWriters=%d\n", vcp->mapcnt, vcp->opens, vcp->execsOrWriters); */
> +	       /* It appears afs_close blocks if called when execsOrWriters > 0 */
> +		if (vcp->execsOrWriters == 0) 
> +	    		(void) afs_close(vcp, vmap->vm_file->f_flags, credp);
>  	    /* only decrement the execsOrWriters flag if this is not a writable
>  	     * file. */
> +		/* Why the limitation here */
>  	    if (! (vmap->vm_file->f_flags & (FWRITE | FTRUNC)))
> -		vcp->execsOrWriters--;
> +		if (vcp->execsOrWriters > 0)
> +			vcp->execsOrWriters--;
>  
>  	    vcp->states &= ~CMAPPED;
> @@ -398,6 +412,10 @@
>      }
>  
> +
>      if (code == 0) {
>  	ObtainWriteLock(&vcp->lock,531);
> +	/* Add an open reference on the first mapping. */
> +	if (vcp->mapcnt == 0) {
> +	    	/* Only for the first time */
>  	/* Set out vma ops so we catch the close. The following test should be
>  	 * the same as used in generic_file_mmap.
> @@ -420,7 +438,5 @@
>  	}
>      
> -    
> -	/* Add an open reference on the first mapping. */
> -	if (vcp->mapcnt == 0) {
> +   		/*These variables get set when mapcnt == 0 */ 
>  	    vcp->execsOrWriters++;
>  	    vcp->opens++;
> 

--