[OpenAFS-devel] Openafs-1.3.74 file delete race condition?

Troy Benjegerdes hozer@hozed.org
Thu, 13 Jan 2005 10:54:12 -0600


I seem to have reproduceable problems with openafs 1.3.74 on an SMP
machine serving imap from Maildirs in AFS. It seems like there is either
a lock, or two processes trying to move a file from the maildir tmp
directory to somewhere else.

imap:~# ps -eo pid,user,s,nwchan,wchan=WIDE-WCHAN-COLUMN,args --sort
user | grep brett
 8500 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8504 brett    S 98ced1 ?                 /usr/bin/imapd Maildir
 8551 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8561 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8565 brett    S 98ced1 ?                 /usr/bin/imapd Maildir
 8569 brett    S 98ced1 ?                 /usr/bin/imapd Maildir
 8573 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8581 brett    S 98ced1 ?                 /usr/bin/imapd Maildir
 8596 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8608 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8618 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8741 brett    S 157c39 select            /usr/bin/imapd Maildir
 8747 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8755 brett    D 1079c3 down              /usr/bin/imapd Maildir
 8772 brett    S 157c39 select            /usr/bin/imapd Maildir
 8843 root     S 1504e8 pipe_wait         grep brett
imap:~#
imap:~# cmdebug localhost
** Cache entry @ 0xf8a6cb60 for 134.536871010.1277.285830
[<my-afs-cell>]
    locks: (writer_waiting, 3 read_locks(pid:8581), 1 waiters)
    18432 bytes DV 1013276 refcnt 1
    callback f697a3e0   expires 1105643469
    0 opens     0 writers
    normal file
    states (0x1), stat'd

>From /proc/ksyms:
f898cbb0 afs_osi_InitWaitHandle [libafs-2.4.27-1-686-smp.mp]
f898cbd0 afs_osi_CancelWait     [libafs-2.4.27-1-686-smp.mp]
f898cc10 afs_osi_Wait   [libafs-2.4.27-1-686-smp.mp]
f898cdc0 afs_osi_SleepSig       [libafs-2.4.27-1-686-smp.mp]  <-----
f898cf40 afs_osi_Sleep  [libafs-2.4.27-1-686-smp.mp]
f898d180 afs_osi_Wakeup [libafs-2.4.27-1-686-smp.mp]


And that inode repored by cmdebug looks to be the following directory:
ls -id Maildir/.Trash-Work/tmp/
6423805 Maildir/.Trash-Work/tmp/




On Wed, Jan 12, 2005 at 03:12:15PM -0500, chas williams - CONTRACTOR wrote:
> well the stuff stuck in the 'D' state is likely a deadlock somewhere
> in the afs code do to a mishandled error condition (like one of the
> afs functions call returns EWHATEVER) and returns before undo some
> important lock.
> 
> if you can duplicate, you could try the alt-sysrq-t to try to determine
> which threads are blocked.
> 
> i think cmdebug can be somewhat useful but i have never used it.  check
> the list archives.
> 
> In message <20050112200340.GY20400@kalmia.hozed.org>,Troy Benjegerdes writes:
> >It seems so.. it's been running serving imap for about 3 weeks with no
> >issues until the quota thing.
> >
> >On Wed, Jan 12, 2005 at 03:01:45PM -0500, chas williams - CONTRACTOR wrote:
> >> no idea i am afraid.  has the 'bad refcount 0' gone away?
> >> 
> >> In message <20050112194118.GX20400@kalmia.hozed.org>,Troy Benjegerdes writes
> >:
> >> >I just had a problem now (with openafs-1.3.74) when a user went over
> >> >their quota deleteing imap mail messages. (imapd decided to copy the
> >> >messages to the user's trash folder).
> >> >
> >> >Several of the imapd processes ended up hung in the 'D' state, and I
> >> >wound up having to reboot.
> >> >
> >> >Is there anything in 1.3.77 that might fix this?
> >> >
> >> >Thanks.
> >> >
> >> >On Wed, Nov 24, 2004 at 11:39:16AM -0500, chas williams (contractor) wrote:
> >> >> In message <20041124163244.GA17697@kalmia.hozed.org>,Troy Benjegerdes wri
> >tes
> >> >:
> >> >> >I'm running a 2.4.27 debian kernel with stock 1.3.74 (built from source)
> >.
> >> >> >
> >> >> >Can you point me to some details on the lock inversion fix (mail list
> >> >> >archives would be fine).. I wonder if that didn't fix it.
> >> >> 
> >> >> it might have.  however, i have seen people with problem even after the
> >> >> lock inversion patch.  usually its gconfd exiting and trying to clean
> >> >> up after itself that seems to trigger these bad refcount == 0 problems.
> >> >> 
> >> >> ------- Forwarded Message
> >> >> 
> >> >> Return-Path: <openafs-devel-admin@openafs.org>
> >> >> Received: from grand.central.org (GRAND.CENTRAL.ORG [128.2.194.109])
> >> >> 	by ginger.cmf.nrl.navy.mil (8.12.11/8.12.11) with ESMTP id i5LGVEbf0209
> >> >24;
> >> >> 	Mon, 21 Jun 2004 12:31:15 -0400 (EDT)
> >> >> Received: from grand.central.org (localhost.localdomain [127.0.0.1])
> >> >> 	by grand.central.org (Postfix) with ESMTP
> >> >> 	id BD22E9C10; Mon, 21 Jun 2004 12:31:07 -0400 (EDT)
> >> >> Delivered-To: openafs-devel@openafs.org
> >> >> Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.
> >10.
> >> >161])
> >> >> 	by grand.central.org (Postfix) with ESMTP id 3CE0A9C0B
> >> >> 	for <openafs-devel@openafs.org>; Mon, 21 Jun 2004 12:30:23 -0400 (EDT)
> >> >> Received: from cmf.nrl.navy.mil (thirdoffive.cmf.nrl.navy.mil [134.207.10
> >.18
> >> >0])
> >> >> 	by ginger.cmf.nrl.navy.mil (8.12.11/8.12.11) with ESMTP id i5LGUHeZ0208
> >> >89
> >> >> 	for <openafs-devel@openafs.org>; Mon, 21 Jun 2004 12:30:17 -0400 (EDT)
> >> >> Message-Id: <200406211630.i5LGUHeZ020889@ginger.cmf.nrl.navy.mil>
> >> >> To: openafs-devel@openafs.org
> >> >> In-Reply-To: Message from Jan-Marc Pilawa <j.pilawa@tu-bs.de> 
> >> >>    of "Fri, 18 Jun 2004 13:38:40 +0200." <200406181338.40992.j.pilawa@tu-
> >bs.
> >> >de> 
> >> >> From: "chas williams (contractor)" <chas@cmf.nrl.navy.mil>
> >> >> X-Spam-Score: () hits=-1.4
> >> >> X-Virus-Scanned: NAI Completed
> >> >> X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang)
> >> >> Subject: [OpenAFS-devel] Re: [OpenAFS] Linux kernel panic, OpenAFS client
> >, g
> >> >conf
> >> >> Sender: openafs-devel-admin@openafs.org
> >> >> Errors-To: openafs-devel-admin@openafs.org
> >> >> X-BeenThere: openafs-devel@openafs.org
> >> >> X-Mailman-Version: 2.0.4
> >> >> Precedence: bulk
> >> >> List-Help: <mailto:openafs-devel-request@openafs.org?subject=help>
> >> >> List-Post: <mailto:openafs-devel@openafs.org>
> >> >> List-Subscribe: <https://lists.openafs.org/mailman/listinfo/openafs-devel
> >>,
> >> >> 	<mailto:openafs-devel-request@openafs.org?subject=subscribe>
> >> >> List-Id: OpenAFS Developers <openafs-devel.openafs.org>
> >> >> List-Unsubscribe: <https://lists.openafs.org/mailman/listinfo/openafs-dev
> >el>
> >> >,
> >> >> 	<mailto:openafs-devel-request@openafs.org?subject=unsubscribe>
> >> >> List-Archive: <https://lists.openafs.org/pipermail/openafs-devel/>
> >> >> Date: Mon, 21 Jun 2004 12:30:19 -0400
> >> >> X-UIDL: M:1"!\)k"!HWM!!V'#!!
> >> >> 
> >> >> [i am moving this discussion to -devel]
> >> >> 
> >> >> looking at this gconf crashing problem i see that
> >> >> afs_linux_dentry_revlidate() needs to hold the big kernel lock (BKL).
> >> >> however if the dentry is bad, during the shrink_dcache_parent()/d_drop()
> >> >> operation we might clear the inode associated with the dentry.
> >> >> this calls afs_dentry_iput() which calls osi_iput() which will try to
> >> >> grab AFS_GLOCK.
> >> >> 
> >> >> well that's bad.  you could drop the AFS_GLOCK before dropping the BKL so
> >> >> that everything is fine but that is probably a poor idea since you could
> >> >> be change lock ordering.  after looking at this some more, i am fairly
> >> >> convinced afs is getting the locks in the wrong order in the first place.
> >> >> 
> >> >> anyway, in fs/namei.c:real_lookup() we have:
> >> >> 
> >> >>                         lock_kernel();
> >> >>                         result = dir->i_op->lookup(dir, dentry);
> >> >>                         unlock_kernel();
> >> >> 
> >> >> and lookup() is afs_linux_lookup() which grabs AFS_GLOCK.  so its
> >> >> seems clear the right order might be lock_kernel(); AFS_GLOCK(); and not
> >> >> what we currently have.  if we make this change then dentry_revalidate()
> >> >> is cleanly fixed by dropping AFS_GLOCK before calling shrink/d_drop
> >> >> but before dropping the BKL.
> >> >> 
> >> >> comments?
> >> >> 
> >> >> [on a historical note i probably added most of the lock_kernel()'s anyway
> >> >> so its my fault really.]
> >> >> _______________________________________________
> >> >> OpenAFS-devel mailing list
> >> >> OpenAFS-devel@openafs.org
> >> >> https://lists.openafs.org/mailman/listinfo/openafs-devel
> >> >> 
> >> >> 
> >> >> ------- End of Forwarded Message
> >> >> 
> >> >
> >> >-- 
> >> >--------------------------------------------------------------------------
> >> >Troy Benjegerdes                'da hozer'                hozer@hozed.org  
> >> >
> >> >Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> >software stuff and not get a real job. Charles Shultz had the best answer:
> >> >
> >> >"Why do musicians compose symphonies and poets write poems? They do it
> >> >because life wouldn't have any meaning for them if they didn't. That's why
> >> >I draw cartoons. It's my life." -- Charles Shultz
> >> >
> >
> >-- 
> >--------------------------------------------------------------------------
> >Troy Benjegerdes                'da hozer'                hozer@hozed.org  
> >
> >Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >software stuff and not get a real job. Charles Shultz had the best answer:
> >
> >"Why do musicians compose symphonies and poets write poems? They do it
> >because life wouldn't have any meaning for them if they didn't. That's why
> >I draw cartoons. It's my life." -- Charles Shultz
> >

-- 
--------------------------------------------------------------------------
Troy Benjegerdes                'da hozer'                hozer@hozed.org  

Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best answer:

"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz