[OpenAFS-devel] Linux and memory mapped files hang the client, particularly when the client is OpenMosix.

George Talbot gtalbot@locuspharma.com
10 Jul 2003 15:21:52 -0400


AHA!  After commenting out the write locking in afs_linux_vma_close(), I
got the same deadlock as the 2002 December one!  So I applied the
December patch again, and re-commented out the write locking, and I seem
to be doing OK.

Please, if the original author of afs_linux_vma_lock() and
afs_linux_release() is out there, can you tell me if what I'm doing is
OK?

Thanks.

--
George T. Talbot
<gtalbot@locuspharma.com>


On Thu, 2003-07-10 at 14:34, George Talbot wrote:
> Another thought occurs to me:
> 
> Maybe afs_linux_vma_close() doesn't need to hold the file lock anyway,
> since change to mapcnt will be protected by AFS_GLOCK() anyway right?
> 
> See the comment in afs_linux_release() for the flushcnt--can the same
> rationale apply to mapcnt?
> 
> --
> George T. Talbot
> <gtalbot@locuspharma.com>
> 
> 
> On Thu, 2003-07-10 at 14:02, George Talbot wrote:
> > Hi,
> > 
> > We have a problem here with clients that make heavy use of mmap() on
> > files stored on an AFS server.  The program doing the mmap() access will
> > hang, as will top, ps, etc.  I'm using Linux kernel 2.4.20, with OpenAFS
> > 1.2.8, and OpenMosix, though I don't think it's OpenMosix, because if I
> > recompile the kernel without OpenMosix, I still get hangs, just not as
> > frequently.
> > 
> > So I found this patch:
> > 
> > https://lists.openafs.org/pipermail/openafs-devel/2002-December/003624.html
> > 
> > This patch does not work for us.  I did some further investigation of
> > where the program is hanging.  The programs hang in
> > afs_linux_vma_close() right when this function tries to acquire a write
> > lock on the vcache entry for the file.  I added some instrumentation,
> > and found that the holder of the lock is afs_GetDcache(). 
> > afs_GetDcache(), when the problem occurs, has acquired the lock at
> > position #66 (search for ",66)" in the source code), and this lock has
> > been converted to a shared lock.
> > 
> > The sequence of events, I believe, is this:
> > 
> > afs_GetDCache() has the AFS_GLOCK(), acquires the write lock for the
> > file, converts the write lock to a shared lock, and drops AFS_GLOCK()
> > while still holding the shared lock on the file, and starts reading
> > blocks.
> > 
> > At this point afs_linux_vma_close() gets called because the application
> > is unmapping the file, acquires the AFS_GLOCK(), and blocks trying to
> > acquire the shared lock.
> > 
> > Then, I believe that afs_GetDCache() runs again after the read
> > completes, tries to acquire the AFS_GLOCK() and blocks.
> > 
> > Classic deadlock.
> > 
> > Any ideas how to fix this?  I think that afs_linux_mmap() and
> > afx_linux_vma_close() are using the write lock to mutually exclude each
> > other, so I think the code still needs to hold the lock.  However, it
> > seems to me a classic case of deadlock to drop the global lock out of
> > order with the file lock.  Should afs_linux_vma_close() somehow wait for
> > any pending reads to complete?  Is there a way to do that?
> > 
> > The previous patch seemed only to change the timing a bit.
> > 
> > Thanks for any insight.
> > 
> > --
> > George T. Talbot
> > <gtalbot@locuspharma.com>
> > 
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel