[OpenAFS-devel] Openafs-1.3.74 file delete race condition?
Troy Benjegerdes
hozer@hozed.org
Thu, 13 Jan 2005 10:54:12 -0600
I seem to have reproduceable problems with openafs 1.3.74 on an SMP
machine serving imap from Maildirs in AFS. It seems like there is either
a lock, or two processes trying to move a file from the maildir tmp
directory to somewhere else.
imap:~# ps -eo pid,user,s,nwchan,wchan=WIDE-WCHAN-COLUMN,args --sort
user | grep brett
8500 brett D 1079c3 down /usr/bin/imapd Maildir
8504 brett S 98ced1 ? /usr/bin/imapd Maildir
8551 brett D 1079c3 down /usr/bin/imapd Maildir
8561 brett D 1079c3 down /usr/bin/imapd Maildir
8565 brett S 98ced1 ? /usr/bin/imapd Maildir
8569 brett S 98ced1 ? /usr/bin/imapd Maildir
8573 brett D 1079c3 down /usr/bin/imapd Maildir
8581 brett S 98ced1 ? /usr/bin/imapd Maildir
8596 brett D 1079c3 down /usr/bin/imapd Maildir
8608 brett D 1079c3 down /usr/bin/imapd Maildir
8618 brett D 1079c3 down /usr/bin/imapd Maildir
8741 brett S 157c39 select /usr/bin/imapd Maildir
8747 brett D 1079c3 down /usr/bin/imapd Maildir
8755 brett D 1079c3 down /usr/bin/imapd Maildir
8772 brett S 157c39 select /usr/bin/imapd Maildir
8843 root S 1504e8 pipe_wait grep brett
imap:~#
imap:~# cmdebug localhost
** Cache entry @ 0xf8a6cb60 for 134.536871010.1277.285830
[<my-afs-cell>]
locks: (writer_waiting, 3 read_locks(pid:8581), 1 waiters)
18432 bytes DV 1013276 refcnt 1
callback f697a3e0 expires 1105643469
0 opens 0 writers
normal file
states (0x1), stat'd
>From /proc/ksyms:
f898cbb0 afs_osi_InitWaitHandle [libafs-2.4.27-1-686-smp.mp]
f898cbd0 afs_osi_CancelWait [libafs-2.4.27-1-686-smp.mp]
f898cc10 afs_osi_Wait [libafs-2.4.27-1-686-smp.mp]
f898cdc0 afs_osi_SleepSig [libafs-2.4.27-1-686-smp.mp] <-----
f898cf40 afs_osi_Sleep [libafs-2.4.27-1-686-smp.mp]
f898d180 afs_osi_Wakeup [libafs-2.4.27-1-686-smp.mp]
And that inode repored by cmdebug looks to be the following directory:
ls -id Maildir/.Trash-Work/tmp/
6423805 Maildir/.Trash-Work/tmp/
On Wed, Jan 12, 2005 at 03:12:15PM -0500, chas williams - CONTRACTOR wrote:
> well the stuff stuck in the 'D' state is likely a deadlock somewhere
> in the afs code do to a mishandled error condition (like one of the
> afs functions call returns EWHATEVER) and returns before undo some
> important lock.
>
> if you can duplicate, you could try the alt-sysrq-t to try to determine
> which threads are blocked.
>
> i think cmdebug can be somewhat useful but i have never used it. check
> the list archives.
>
> In message <20050112200340.GY20400@kalmia.hozed.org>,Troy Benjegerdes writes:
> >It seems so.. it's been running serving imap for about 3 weeks with no
> >issues until the quota thing.
> >
> >On Wed, Jan 12, 2005 at 03:01:45PM -0500, chas williams - CONTRACTOR wrote:
> >> no idea i am afraid. has the 'bad refcount 0' gone away?
> >>
> >> In message <20050112194118.GX20400@kalmia.hozed.org>,Troy Benjegerdes writes
> >:
> >> >I just had a problem now (with openafs-1.3.74) when a user went over
> >> >their quota deleteing imap mail messages. (imapd decided to copy the
> >> >messages to the user's trash folder).
> >> >
> >> >Several of the imapd processes ended up hung in the 'D' state, and I
> >> >wound up having to reboot.
> >> >
> >> >Is there anything in 1.3.77 that might fix this?
> >> >
> >> >Thanks.
> >> >
> >> >On Wed, Nov 24, 2004 at 11:39:16AM -0500, chas williams (contractor) wrote:
> >> >> In message <20041124163244.GA17697@kalmia.hozed.org>,Troy Benjegerdes wri
> >tes
> >> >:
> >> >> >I'm running a 2.4.27 debian kernel with stock 1.3.74 (built from source)
> >.
> >> >> >
> >> >> >Can you point me to some details on the lock inversion fix (mail list
> >> >> >archives would be fine).. I wonder if that didn't fix it.
> >> >>
> >> >> it might have. however, i have seen people with problem even after the
> >> >> lock inversion patch. usually its gconfd exiting and trying to clean
> >> >> up after itself that seems to trigger these bad refcount == 0 problems.
> >> >>
> >> >> ------- Forwarded Message
> >> >>
> >> >> Return-Path: <openafs-devel-admin@openafs.org>
> >> >> Received: from grand.central.org (GRAND.CENTRAL.ORG [128.2.194.109])
> >> >> by ginger.cmf.nrl.navy.mil (8.12.11/8.12.11) with ESMTP id i5LGVEbf0209
> >> >24;
> >> >> Mon, 21 Jun 2004 12:31:15 -0400 (EDT)
> >> >> Received: from grand.central.org (localhost.localdomain [127.0.0.1])
> >> >> by grand.central.org (Postfix) with ESMTP
> >> >> id BD22E9C10; Mon, 21 Jun 2004 12:31:07 -0400 (EDT)
> >> >> Delivered-To: openafs-devel@openafs.org
> >> >> Received: from ginger.cmf.nrl.navy.mil (ginger.cmf.nrl.navy.mil [134.207.
> >10.
> >> >161])
> >> >> by grand.central.org (Postfix) with ESMTP id 3CE0A9C0B
> >> >> for <openafs-devel@openafs.org>; Mon, 21 Jun 2004 12:30:23 -0400 (EDT)
> >> >> Received: from cmf.nrl.navy.mil (thirdoffive.cmf.nrl.navy.mil [134.207.10
> >.18
> >> >0])
> >> >> by ginger.cmf.nrl.navy.mil (8.12.11/8.12.11) with ESMTP id i5LGUHeZ0208
> >> >89
> >> >> for <openafs-devel@openafs.org>; Mon, 21 Jun 2004 12:30:17 -0400 (EDT)
> >> >> Message-Id: <200406211630.i5LGUHeZ020889@ginger.cmf.nrl.navy.mil>
> >> >> To: openafs-devel@openafs.org
> >> >> In-Reply-To: Message from Jan-Marc Pilawa <j.pilawa@tu-bs.de>
> >> >> of "Fri, 18 Jun 2004 13:38:40 +0200." <200406181338.40992.j.pilawa@tu-
> >bs.
> >> >de>
> >> >> From: "chas williams (contractor)" <chas@cmf.nrl.navy.mil>
> >> >> X-Spam-Score: () hits=-1.4
> >> >> X-Virus-Scanned: NAI Completed
> >> >> X-Scanned-By: MIMEDefang 2.30 (www . roaringpenguin . com / mimedefang)
> >> >> Subject: [OpenAFS-devel] Re: [OpenAFS] Linux kernel panic, OpenAFS client
> >, g
> >> >conf
> >> >> Sender: openafs-devel-admin@openafs.org
> >> >> Errors-To: openafs-devel-admin@openafs.org
> >> >> X-BeenThere: openafs-devel@openafs.org
> >> >> X-Mailman-Version: 2.0.4
> >> >> Precedence: bulk
> >> >> List-Help: <mailto:openafs-devel-request@openafs.org?subject=help>
> >> >> List-Post: <mailto:openafs-devel@openafs.org>
> >> >> List-Subscribe: <https://lists.openafs.org/mailman/listinfo/openafs-devel
> >>,
> >> >> <mailto:openafs-devel-request@openafs.org?subject=subscribe>
> >> >> List-Id: OpenAFS Developers <openafs-devel.openafs.org>
> >> >> List-Unsubscribe: <https://lists.openafs.org/mailman/listinfo/openafs-dev
> >el>
> >> >,
> >> >> <mailto:openafs-devel-request@openafs.org?subject=unsubscribe>
> >> >> List-Archive: <https://lists.openafs.org/pipermail/openafs-devel/>
> >> >> Date: Mon, 21 Jun 2004 12:30:19 -0400
> >> >> X-UIDL: M:1"!\)k"!HWM!!V'#!!
> >> >>
> >> >> [i am moving this discussion to -devel]
> >> >>
> >> >> looking at this gconf crashing problem i see that
> >> >> afs_linux_dentry_revlidate() needs to hold the big kernel lock (BKL).
> >> >> however if the dentry is bad, during the shrink_dcache_parent()/d_drop()
> >> >> operation we might clear the inode associated with the dentry.
> >> >> this calls afs_dentry_iput() which calls osi_iput() which will try to
> >> >> grab AFS_GLOCK.
> >> >>
> >> >> well that's bad. you could drop the AFS_GLOCK before dropping the BKL so
> >> >> that everything is fine but that is probably a poor idea since you could
> >> >> be change lock ordering. after looking at this some more, i am fairly
> >> >> convinced afs is getting the locks in the wrong order in the first place.
> >> >>
> >> >> anyway, in fs/namei.c:real_lookup() we have:
> >> >>
> >> >> lock_kernel();
> >> >> result = dir->i_op->lookup(dir, dentry);
> >> >> unlock_kernel();
> >> >>
> >> >> and lookup() is afs_linux_lookup() which grabs AFS_GLOCK. so its
> >> >> seems clear the right order might be lock_kernel(); AFS_GLOCK(); and not
> >> >> what we currently have. if we make this change then dentry_revalidate()
> >> >> is cleanly fixed by dropping AFS_GLOCK before calling shrink/d_drop
> >> >> but before dropping the BKL.
> >> >>
> >> >> comments?
> >> >>
> >> >> [on a historical note i probably added most of the lock_kernel()'s anyway
> >> >> so its my fault really.]
> >> >> _______________________________________________
> >> >> OpenAFS-devel mailing list
> >> >> OpenAFS-devel@openafs.org
> >> >> https://lists.openafs.org/mailman/listinfo/openafs-devel
> >> >>
> >> >>
> >> >> ------- End of Forwarded Message
> >> >>
> >> >
> >> >--
> >> >--------------------------------------------------------------------------
> >> >Troy Benjegerdes 'da hozer' hozer@hozed.org
> >> >
> >> >Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >> >software stuff and not get a real job. Charles Shultz had the best answer:
> >> >
> >> >"Why do musicians compose symphonies and poets write poems? They do it
> >> >because life wouldn't have any meaning for them if they didn't. That's why
> >> >I draw cartoons. It's my life." -- Charles Shultz
> >> >
> >
> >--
> >--------------------------------------------------------------------------
> >Troy Benjegerdes 'da hozer' hozer@hozed.org
> >
> >Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
> >software stuff and not get a real job. Charles Shultz had the best answer:
> >
> >"Why do musicians compose symphonies and poets write poems? They do it
> >because life wouldn't have any meaning for them if they didn't. That's why
> >I draw cartoons. It's my life." -- Charles Shultz
> >
--
--------------------------------------------------------------------------
Troy Benjegerdes 'da hozer' hozer@hozed.org
Somone asked my why I work on this free (http://www.fsf.org/philosophy/)
software stuff and not get a real job. Charles Shultz had the best answer:
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's why
I draw cartoons. It's my life." -- Charles Shultz