[OpenAFS] Re: Openafs kernel problems
Richard Brittain
Richard.Brittain@dartmouth.edu
Wed, 12 Feb 2014 14:17:37 -0500
On Tue, 11 Feb 2014, Andrew Deason wrote:
> On Tue, 4 Feb 2014 09:38:27 -0500
> Dave Botsch <botsch@cnf.cornell.edu> wrote:
>
>> kernel: 2.6.32-431.3.1.el6.x86_64
>
> Just a little bit of more information about this. Apparently the
> problematic code was also introduced in the RHEL 6.4 kernel series
> (2.6.32-358*), but was quickly pulled out. Evidently from this thread it
> was added again in the 6.5 series (2.6.32-431*) and maybe isn't coming
> back out. I'm not entirely sure, since I can't find a changelog entry
> for this, and redpatch.git is currently not helpful since it hasn't
> caught up to the most recent versions yet.
One more data point -- we upgraded a client to 2.6.32-431.3.1.el6.x86_64
and began getting kernel Oops (but only sometimes) when a daily script ran
'gzip -c bigfile | tee copyfile > /afs/some/path/in/afs'
Red Hat weren't interested because of the tainted stack trace, but if I
removed the AFS client and ran the same thing to local disk, 'gzip' turns
into an unkillable process consuming 100% CPU, but no longer doing any
I/O.
We never generated an actual kernel crash without AFS.
The gzip binary is unchanged, and older kernels work fine. I couldn't
reproduce without the '| tee', so apparently that is needed to tickle
the bug.
Richard
> If anyone wants to try to ask Red Hat about it, you'd be asking about
> what versions include functionality related to the upstream Linux commit
> I mentioned earlier (7732a557b1342c6e6966efb5f07effcf99f56167). I assume
> they won't change anything for us, but it's always helpful to know what
> they're going to be doing with it, if they have a specific plan or
> timeline in mind.
--
Richard Brittain, Research Computing Group,
IT Services, 37 Dewey Field Road, HB6219
Dartmouth College, Hanover NH 03755
Richard.Brittain@dartmouth.edu 6-2085