[OpenAFS-devel] Linux and memory mapped files hang the client, particularly when the client is OpenMosix.

Derek Atkins warlord@MIT.EDU
10 Jul 2003 17:09:28 -0400


Have you tried 1.2.10-pre-release?

-derek

George Talbot <gtalbot@locuspharma.com> writes:

> AHA!  After commenting out the write locking in afs_linux_vma_close(), I
> got the same deadlock as the 2002 December one!  So I applied the
> December patch again, and re-commented out the write locking, and I seem
> to be doing OK.
> 
> Please, if the original author of afs_linux_vma_lock() and
> afs_linux_release() is out there, can you tell me if what I'm doing is
> OK?
> 
> Thanks.
> 
> --
> George T. Talbot
> <gtalbot@locuspharma.com>
> 
> 
> On Thu, 2003-07-10 at 14:34, George Talbot wrote:
> > Another thought occurs to me:
> > 
> > Maybe afs_linux_vma_close() doesn't need to hold the file lock anyway,
> > since change to mapcnt will be protected by AFS_GLOCK() anyway right?
> > 
> > See the comment in afs_linux_release() for the flushcnt--can the same
> > rationale apply to mapcnt?
> > 
> > --
> > George T. Talbot
> > <gtalbot@locuspharma.com>
> > 
> > 
> > On Thu, 2003-07-10 at 14:02, George Talbot wrote:
> > > Hi,
> > > 
> > > We have a problem here with clients that make heavy use of mmap() on
> > > files stored on an AFS server.  The program doing the mmap() access will
> > > hang, as will top, ps, etc.  I'm using Linux kernel 2.4.20, with OpenAFS
> > > 1.2.8, and OpenMosix, though I don't think it's OpenMosix, because if I
> > > recompile the kernel without OpenMosix, I still get hangs, just not as
> > > frequently.
> > > 
> > > So I found this patch:
> > > 
> > > https://lists.openafs.org/pipermail/openafs-devel/2002-December/003624.html
> > > 
> > > This patch does not work for us.  I did some further investigation of
> > > where the program is hanging.  The programs hang in
> > > afs_linux_vma_close() right when this function tries to acquire a write
> > > lock on the vcache entry for the file.  I added some instrumentation,
> > > and found that the holder of the lock is afs_GetDcache(). 
> > > afs_GetDcache(), when the problem occurs, has acquired the lock at
> > > position #66 (search for ",66)" in the source code), and this lock has
> > > been converted to a shared lock.
> > > 
> > > The sequence of events, I believe, is this:
> > > 
> > > afs_GetDCache() has the AFS_GLOCK(), acquires the write lock for the
> > > file, converts the write lock to a shared lock, and drops AFS_GLOCK()
> > > while still holding the shared lock on the file, and starts reading
> > > blocks.
> > > 
> > > At this point afs_linux_vma_close() gets called because the application
> > > is unmapping the file, acquires the AFS_GLOCK(), and blocks trying to
> > > acquire the shared lock.
> > > 
> > > Then, I believe that afs_GetDCache() runs again after the read
> > > completes, tries to acquire the AFS_GLOCK() and blocks.
> > > 
> > > Classic deadlock.
> > > 
> > > Any ideas how to fix this?  I think that afs_linux_mmap() and
> > > afx_linux_vma_close() are using the write lock to mutually exclude each
> > > other, so I think the code still needs to hold the lock.  However, it
> > > seems to me a classic case of deadlock to drop the global lock out of
> > > order with the file lock.  Should afs_linux_vma_close() somehow wait for
> > > any pending reads to complete?  Is there a way to do that?
> > > 
> > > The previous patch seemed only to change the timing a bit.
> > > 
> > > Thanks for any insight.
> > > 
> > > --
> > > George T. Talbot
> > > <gtalbot@locuspharma.com>
> > > 
> > > _______________________________________________
> > > OpenAFS-devel mailing list
> > > OpenAFS-devel@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available