[OpenAFS] Re: Openafs kernel problems

Richard Brittain Richard.Brittain@dartmouth.edu
Wed, 12 Feb 2014 14:17:37 -0500


On Tue, 11 Feb 2014, Andrew Deason wrote:

> On Tue, 4 Feb 2014 09:38:27 -0500
> Dave Botsch <botsch@cnf.cornell.edu> wrote:
>
>> kernel:         2.6.32-431.3.1.el6.x86_64
>
> Just a little bit of more information about this. Apparently the
> problematic code was also introduced in the RHEL 6.4 kernel series
> (2.6.32-358*), but was quickly pulled out. Evidently from this thread it
> was added again in the 6.5 series (2.6.32-431*) and maybe isn't coming
> back out. I'm not entirely sure, since I can't find a changelog entry
> for this, and redpatch.git is currently not helpful since it hasn't
> caught up to the most recent versions yet.

One more data point -- we upgraded a client to 2.6.32-431.3.1.el6.x86_64 
and began getting kernel Oops (but only sometimes) when a daily script ran
'gzip -c bigfile | tee copyfile > /afs/some/path/in/afs'
Red Hat weren't interested because of the tainted stack trace, but if I 
removed the AFS client and ran the same thing to local disk, 'gzip' turns 
into an unkillable process consuming 100% CPU, but no longer doing any 
I/O.
We never generated an actual kernel crash without AFS.

The gzip binary is unchanged, and older kernels work fine.  I couldn't 
reproduce without the  '| tee', so apparently that is needed to tickle 
the bug.

  Richard

> If anyone wants to try to ask Red Hat about it, you'd be asking about
> what versions include functionality related to the upstream Linux commit
> I mentioned earlier (7732a557b1342c6e6966efb5f07effcf99f56167). I assume
> they won't change anything for us, but it's always helpful to know what
> they're going to be doing with it, if they have a specific plan or
> timeline in mind.

-- 
Richard Brittain,  Research Computing Group,
                    IT Services, 37 Dewey Field Road, HB6219
                    Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085