[OpenAFS-devel] Re: [openMosix-general] OpenAFS + openMosix
Moshe Bar
moshe@Moelabs.com
Wed, 4 Dec 2002 12:54:57 +0200 (IST)
Great analysis and problem resoultion!
thanks
Moshe
On Tue, 3 Dec 2002, Onime Clement
wrote:
>
> Subject: OpenAFS + OpenMosix
>
> I seem to have discovered a bug in the OpenAFS kernel module code that
> causes linux to fail when using the OpenMosix kernel.
> This problem was reported earlier this year with previous openafs and
> openmosix versions.
>
> OpenAFS version 1.2.7
> OpenMosix kernel version 2.4.18-openmosix3
> RedHat Linux 7.3
>
> Symptoms:
> =========
> When running on an openmosix kernel. Some applications
> that tries to close files that were opened using the mmap call runs into
> problems. The application hangs, it is impossible to kill the application,
> impossible to run top, ps or kill or any other proc base utility.
> Although the computer is still usable, it is also impossible to halt or
> reboot the machine via software. Only a hard reset works or power off works.
>
> How to reproduce:
> =================
> It is possible to reproduce the problem by trying to run netscape or staroffice
> 5.2 when your home directory is in AFS, on a machine running an OpenMosix
> kernel.
>
> Analysis:
> =========
> I traced the code to the routine afs_linux_vma_close in the osi_vnodeops.c file
> And particularly the afs_close call
> Preliminary tracing using printk indicates that the sometimes calls to
> afs_close in the function does not return.
> Specifically when afs_close is called with vcp->execsOrWriters > 0 , it blocks
> which causes the hanging.
>
> patch:
> ======
> A simple patch to osi_vnodeops.c to fix this problem is included below and
> attached to this mail.
> The patch needs to be applied from the src directory or with the right patch
> -p option.
>
>
> Thanks
> Clement Onime
> System and Network Analyst
> The Abdus Salam ICTP
> Trieste, Italy
>
> <--------------------- cut here --------------------->
>
> --- afs/osi_vnodeops.c.orig Thu Aug 1 21:12:01 2002
> +++ afs/osi_vnodeops.c Tue Dec 3 09:43:21 2002
> @@ -342,12 +342,26 @@
> if (vcp->mapcnt) {
> vcp->mapcnt--;
> + if (vcp->mapcnt == 0) {
> + /* This bit is supposed to undo what was
> + * done on lines */
> + if (vcp->execsOrWriters > 0)
> + vcp->execsOrWriters--;
> + if (vcp->opens > 0)
> + vcp->opens--;
> + /* vcp->states &= ~CMAPPED; */
> + }
> ReleaseWriteLock(&vcp->lock);
> if (!vcp->mapcnt) {
> credp = crref();
> - (void) afs_close(vcp, vmap->vm_file->f_flags, credp);
> + /* printk("AFSMM: afs_close: Mapcnt=%d Opens=%d execsOrWriters=%d\n", vcp->mapcnt, vcp->opens, vcp->execsOrWriters); */
> + /* It appears afs_close blocks if called when execsOrWriters > 0 */
> + if (vcp->execsOrWriters == 0)
> + (void) afs_close(vcp, vmap->vm_file->f_flags, credp);
> /* only decrement the execsOrWriters flag if this is not a writable
> * file. */
> + /* Why the limitation here */
> if (! (vmap->vm_file->f_flags & (FWRITE | FTRUNC)))
> - vcp->execsOrWriters--;
> + if (vcp->execsOrWriters > 0)
> + vcp->execsOrWriters--;
>
> vcp->states &= ~CMAPPED;
> @@ -398,6 +412,10 @@
> }
>
> +
> if (code == 0) {
> ObtainWriteLock(&vcp->lock,531);
> + /* Add an open reference on the first mapping. */
> + if (vcp->mapcnt == 0) {
> + /* Only for the first time */
> /* Set out vma ops so we catch the close. The following test should be
> * the same as used in generic_file_mmap.
> @@ -420,7 +438,5 @@
> }
>
> -
> - /* Add an open reference on the first mapping. */
> - if (vcp->mapcnt == 0) {
> + /*These variables get set when mapcnt == 0 */
> vcp->execsOrWriters++;
> vcp->opens++;
>
--