[OpenAFS-devel] Why do afsd daemons loop tightly after receiving a SIGHUP?

Derrick J Brashear shadow@dementia.org
Thu, 30 Aug 2001 17:13:49 -0400 (EDT)


On Thu, 30 Aug 2001, Matt Peterson wrote:

> On Wednesday 29 August 2001 12:23 pm, Derrick J Brashear wrote:
> > On Thu, Aug 02, 2001 at 10:50:15PM -0400, Derek Atkins wrote:
> > > Well, yea.  It looks like we should be able to flush_signals() on the
> > > current thread context.  I _thought_ that's what we were doing
> > > already.  Looking at src/afs/LINUX/osi_sleep.c, in afs_osi_Wait() we
> > > do actually call flush_signals() if osi_TimedSleep() returns non-zero
> > > and aintok (the third argument to afs_osi_Wait()) is zero.
> > >
> > > So, this _should_ be doing the right thing, provided aintok is zero.
> > > And indeed, it definitely looks like all the calls to afs_osi_Wait
> > > indeed pass zero as the third argument.  So, we should be flushing
> > > the signals.
> >
> > Except in HandleFlock (afs/VNOPS/afs_vnop_flock.c)
> >
> > > AHH, the flush_signals() code is only activated if AFS_GLOBAL_SUNLOCK
> > > is defined.  And that is only defined if AFS_SMP is defined.  This
> > > means that signals are only flushed properly on SMP machines!  I bet
> > > that's the problem.  :)
> >
> > Making the flush_signals() path be active in all cases makes the problem
> > less prevalent but some thread(s) don't follow that code path, so more
> > work is needed.
> >
> 
> Agreed.   Are there plans for this work to be done, or are we waiting for 
> volunteers?  

It's on my list of things to do when I get time, but if you or anyone else
wishes to volunteer it will almost certainly get done sooner.

-D