[OpenAFS-devel] Linux and memory mapped files hang the client, particularly when the client is OpenMosix.

George Talbot gtalbot@locuspharma.com
10 Jul 2003 14:34:24 -0400


Another thought occurs to me:

Maybe afs_linux_vma_close() doesn't need to hold the file lock anyway,
since change to mapcnt will be protected by AFS_GLOCK() anyway right?

See the comment in afs_linux_release() for the flushcnt--can the same
rationale apply to mapcnt?

--
George T. Talbot
<gtalbot@locuspharma.com>


On Thu, 2003-07-10 at 14:02, George Talbot wrote:
> Hi,
> 
> We have a problem here with clients that make heavy use of mmap() on
> files stored on an AFS server.  The program doing the mmap() access will
> hang, as will top, ps, etc.  I'm using Linux kernel 2.4.20, with OpenAFS
> 1.2.8, and OpenMosix, though I don't think it's OpenMosix, because if I
> recompile the kernel without OpenMosix, I still get hangs, just not as
> frequently.
> 
> So I found this patch:
> 
> https://lists.openafs.org/pipermail/openafs-devel/2002-December/003624.html
> 
> This patch does not work for us.  I did some further investigation of
> where the program is hanging.  The programs hang in
> afs_linux_vma_close() right when this function tries to acquire a write
> lock on the vcache entry for the file.  I added some instrumentation,
> and found that the holder of the lock is afs_GetDcache(). 
> afs_GetDcache(), when the problem occurs, has acquired the lock at
> position #66 (search for ",66)" in the source code), and this lock has
> been converted to a shared lock.
> 
> The sequence of events, I believe, is this:
> 
> afs_GetDCache() has the AFS_GLOCK(), acquires the write lock for the
> file, converts the write lock to a shared lock, and drops AFS_GLOCK()
> while still holding the shared lock on the file, and starts reading
> blocks.
> 
> At this point afs_linux_vma_close() gets called because the application
> is unmapping the file, acquires the AFS_GLOCK(), and blocks trying to
> acquire the shared lock.
> 
> Then, I believe that afs_GetDCache() runs again after the read
> completes, tries to acquire the AFS_GLOCK() and blocks.
> 
> Classic deadlock.
> 
> Any ideas how to fix this?  I think that afs_linux_mmap() and
> afx_linux_vma_close() are using the write lock to mutually exclude each
> other, so I think the code still needs to hold the lock.  However, it
> seems to me a classic case of deadlock to drop the global lock out of
> order with the file lock.  Should afs_linux_vma_close() somehow wait for
> any pending reads to complete?  Is there a way to do that?
> 
> The previous patch seemed only to change the timing a bit.
> 
> Thanks for any insight.
> 
> --
> George T. Talbot
> <gtalbot@locuspharma.com>
> 
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel