[OpenAFS-devel] Re: Crash on AIX 6.1 TL6 SP3

Niklas Edmundsson Niklas.Edmundsson@hpc2n.umu.se
Thu, 24 Feb 2011 10:57:40 +0100 (MET)


On Wed, 23 Feb 2011, Andrew Deason wrote:

> On Wed, 23 Feb 2011 10:57:46 -0500
> Todd DeSantis <atd@us.ibm.com> wrote:
>
>> The problems with these crashes would start in TL5.  And as
>> Andrew mentions, there was probably another change in the socket
>> structures responsible for this.
>>
>> If the AFS binaries are compiled on an AIX 6.1 TL5 machine,
>> then they will work on the TL5 and TL6 versions of the OS.
>
> Is there any easy way we can detect this change at runtime (or just the
> version and TL), so we could refuse to load or return some kind of error
> instead of panicing the machine?

Intriguing. AIX, as far as I know, has a rather good history of not 
breaking compatibility of loadable modules within a major release.

Wild guesses follows:

The crash happens in m_free(), which suggests that it's only the mbuf 
stuff that has changed somehow, and that all goes well until the 
allocated buffer is to be free:d.

Comparing sys/mbuf.h on TL4 and TL6 I see that there is a expansion 
flag field named m_flags2 which isn't initialized by the TL4 MGETHDR 
allocation macro, even though it exists in the struct.

If we're lucky we might ensure that builds on TLearly works on TLlate 
with just doing m_flags2=0 ourselves.

I'll have to find some time to test all this, starting with building 
on TL6 and verify that it indeed works for me before doing more random 
experiments ;)


/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke@hpc2n.umu.se
---------------------------------------------------------------------------
  Sheesh! You start havin' fun, and they send the lawyers!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=