[OpenAFS-devel] Linux and memory mapped files hang the client,
particularly when the client is OpenMosix.
George Talbot
gtalbot@locuspharma.com
10 Jul 2003 15:21:52 -0400
AHA! After commenting out the write locking in afs_linux_vma_close(), I
got the same deadlock as the 2002 December one! So I applied the
December patch again, and re-commented out the write locking, and I seem
to be doing OK.
Please, if the original author of afs_linux_vma_lock() and
afs_linux_release() is out there, can you tell me if what I'm doing is
OK?
Thanks.
--
George T. Talbot
<gtalbot@locuspharma.com>
On Thu, 2003-07-10 at 14:34, George Talbot wrote:
> Another thought occurs to me:
>
> Maybe afs_linux_vma_close() doesn't need to hold the file lock anyway,
> since change to mapcnt will be protected by AFS_GLOCK() anyway right?
>
> See the comment in afs_linux_release() for the flushcnt--can the same
> rationale apply to mapcnt?
>
> --
> George T. Talbot
> <gtalbot@locuspharma.com>
>
>
> On Thu, 2003-07-10 at 14:02, George Talbot wrote:
> > Hi,
> >
> > We have a problem here with clients that make heavy use of mmap() on
> > files stored on an AFS server. The program doing the mmap() access will
> > hang, as will top, ps, etc. I'm using Linux kernel 2.4.20, with OpenAFS
> > 1.2.8, and OpenMosix, though I don't think it's OpenMosix, because if I
> > recompile the kernel without OpenMosix, I still get hangs, just not as
> > frequently.
> >
> > So I found this patch:
> >
> > https://lists.openafs.org/pipermail/openafs-devel/2002-December/003624.html
> >
> > This patch does not work for us. I did some further investigation of
> > where the program is hanging. The programs hang in
> > afs_linux_vma_close() right when this function tries to acquire a write
> > lock on the vcache entry for the file. I added some instrumentation,
> > and found that the holder of the lock is afs_GetDcache().
> > afs_GetDcache(), when the problem occurs, has acquired the lock at
> > position #66 (search for ",66)" in the source code), and this lock has
> > been converted to a shared lock.
> >
> > The sequence of events, I believe, is this:
> >
> > afs_GetDCache() has the AFS_GLOCK(), acquires the write lock for the
> > file, converts the write lock to a shared lock, and drops AFS_GLOCK()
> > while still holding the shared lock on the file, and starts reading
> > blocks.
> >
> > At this point afs_linux_vma_close() gets called because the application
> > is unmapping the file, acquires the AFS_GLOCK(), and blocks trying to
> > acquire the shared lock.
> >
> > Then, I believe that afs_GetDCache() runs again after the read
> > completes, tries to acquire the AFS_GLOCK() and blocks.
> >
> > Classic deadlock.
> >
> > Any ideas how to fix this? I think that afs_linux_mmap() and
> > afx_linux_vma_close() are using the write lock to mutually exclude each
> > other, so I think the code still needs to hold the lock. However, it
> > seems to me a classic case of deadlock to drop the global lock out of
> > order with the file lock. Should afs_linux_vma_close() somehow wait for
> > any pending reads to complete? Is there a way to do that?
> >
> > The previous patch seemed only to change the timing a bit.
> >
> > Thanks for any insight.
> >
> > --
> > George T. Talbot
> > <gtalbot@locuspharma.com>
> >
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel