[OpenAFS-devel] Re: Crash on AIX 6.1 TL6 SP3
Niklas Edmundsson
Niklas.Edmundsson@hpc2n.umu.se
Thu, 24 Feb 2011 10:57:40 +0100 (MET)
On Wed, 23 Feb 2011, Andrew Deason wrote:
> On Wed, 23 Feb 2011 10:57:46 -0500
> Todd DeSantis <atd@us.ibm.com> wrote:
>
>> The problems with these crashes would start in TL5. And as
>> Andrew mentions, there was probably another change in the socket
>> structures responsible for this.
>>
>> If the AFS binaries are compiled on an AIX 6.1 TL5 machine,
>> then they will work on the TL5 and TL6 versions of the OS.
>
> Is there any easy way we can detect this change at runtime (or just the
> version and TL), so we could refuse to load or return some kind of error
> instead of panicing the machine?
Intriguing. AIX, as far as I know, has a rather good history of not
breaking compatibility of loadable modules within a major release.
Wild guesses follows:
The crash happens in m_free(), which suggests that it's only the mbuf
stuff that has changed somehow, and that all goes well until the
allocated buffer is to be free:d.
Comparing sys/mbuf.h on TL4 and TL6 I see that there is a expansion
flag field named m_flags2 which isn't initialized by the TL4 MGETHDR
allocation macro, even though it exists in the struct.
If we're lucky we might ensure that builds on TLearly works on TLlate
with just doing m_flags2=0 ourselves.
I'll have to find some time to test all this, starting with building
on TL6 and verify that it indeed works for me before doing more random
experiments ;)
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@hpc2n.umu.se
---------------------------------------------------------------------------
Sheesh! You start havin' fun, and they send the lawyers!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=