[OpenAFS-devel] Re: OpenAFS client crashes on RHEL 5.10 and RHEL 6.5

Andrew Deason adeason@sinenomine.net
Wed, 19 Mar 2014 13:02:55 -0500


On Wed, 19 Mar 2014 12:13:24 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Wed, Mar 19, 2014 at 10:46:15AM -0500, Andrew Deason wrote:
> > Via __d_unalias, yeah, that looks like it would make it unsuitable.
> > But the restrictions on directory loops maybe makes this unsuitable
> > anyway.
> > 
> > Is the reasoning for the trylocks just lock ordering?
> 
> Yes, if we waited for those locks at this point in the code, we could
> deadlock, since the lock-holder we're waiting on might be waiting on
> the i_mutex of the parent directory, which we already hold.  I haven't
> thought very hard about how to fix that yet.
> 
> > It seems odd to me that lookups for anything using
> > d_materialise_unique can seem to just randomly fail.
> 
> Agreed, I think this hasn't been fixed just because it's very rarely
> hit in practice (in the absence of multiply-linked directories like
> you have).  We can reproduce the failure on NFS with artificial
> testing.  It requires an uncached lookup to happen at the same time as
> a cross-directory rename from another client.
> 
> > I don't suppose there's any way the GPLONLY restriction on those
> > interfaces could change? Or do you have any other suggestions or
> > anything as to an approach we should be taking?
> 
> I don't know really know that code, apologies.  My only suggestion
> would be to look at the in-kernel NFS and AFS code and see how it does
> this.

They rely on the in-kernel mechanisms I was referring to (really, that's
the only sane way to do it). So looking at them has not really been
helpful.

> As for GPL_ONLY symbols I think it's certainly worth compiling a list
> and asking.  Obviously the kernel community (myself included!) would
> be much happier to see effort invested on upstream code.  But obvious
> oversights (d_materialise_unique looks like one) do get fixed.

Where would such a request go? LKML? Does anyone on the openafs side
here know if we've ever asked for that before, or have any other
opinions here? I assume Red Hat or any downstream will not / cannot
change any such restrictions that upstream does not change.

If it's not clear, though, this isn't a simple oversight or something
that happens to be GPLONLY when other similar symbols are not. This is
basically an entire subsystem that as far as I know has always been
GPL-restricted. That suggested to me that the changes of changing the
restriction are around zero, but I honestly don't really understand the
various motivations to restrict or unrestrict interfaces, so I can't
pretend to have any idea what will actually happen.

-- 
Andrew Deason
adeason@sinenomine.net