[OpenAFS] crash on AIX 5.2

Hans-Gunther Borrmann hans-gunther.borrmann@rz.uni-freiburg.de
Thu, 13 Jan 2005 12:48:31 +0100


On Tuesday 11 January 2005 17:45, Hartmut Reuter wrote:
> Jeffrey Altman wrote:
> > Hartmut Reuter wrote:
> >> I am in the process of tracking down all differences between my good
> >> version and 1.3.77.
> >>
> >> I am now not very distant from 1.3.77, and at least one problem seems
> >> to be the new code in afs_pioctl.c for get and set tokens along with
> >> the huge ticket size introduced for compatibilty with active directory.
> >> Keeping the old ticket size and the old code for tokens in afs_pioctl.c
> >> results in a fairly stable client. At least I can get a token, make
> >> clean in the openafs-tree and make dest without crashing the system.
> >> This is certainly not enough testing for putting it into production,
> >> but a hint where the problem may be hidden.
> >>
> >> Hartmut
> >
> > We know the problem is in the set/get token code on AIX.  More then
> > likely the stack is too small to support a 12000 byte object and it
> > is getting blown away on AIX.  The question is:
> >
> >   * where is this object that is located on the stack?
> >
> > If you can find that, then you will have solved the bug.
>
> Does not look like stack overflow. The crash always happens in xmalloc1:
>
> (0)> f
> pvthread+00A500 STACK:
> [006021F0]xmalloc1+0007AC (0000000000000200, F10000E00C22E000,
>     0000000000000000, F10000E00C22E000, 0000000000000400, F10000E03B964269,
>     0000000000000002, 00000000003E4338 [??])
> [00606B70]xmalloc+000208 (??, ??, ??)
> [08E41978]afs_osi_Alloc+00005C (??)
> [08EBC6DC]afs_HandlePioctl+0003D4 (0000000000000000, 800C5608800C5608,
>     F00000002FF3A400, 0000000000000000, F00000002FF3A438)
> [08EC74F8]afs_syscall_pioctl+000294 (0000000000000000, 800C5608800C5608,
>     000000002FF21FC0, 0000000000000000)
> [08E46000]syscall+0001A0 (0000001400000014, 0000000000000000,
>     800C5608800C5608, 2FF21FC02FF21FC0, 0000000000000000, 2E6D70672E6D7067,
>     0000008000000080)
> [08E45DB8]lpioctl+000050 (0000000000000000, 800C5608800C5608,
>     000000002FF21FC0, 0000000000000000)
> [0000379C]sc_msr_2_point+000028 ()
> Not a valid dump data area @ 2FF21CF0
> (0)>
>
> So there probably storage on the kernel heap was overwritten.
>
> Hartmut
>
> > Jeffrey Altman

-- 
________________________________________________________________
Hans-Gunther Borrmann <hans-gunther.borrmann@rz.uni-freiburg.de>
Rechenzentrum der Universitaet Freiburg
Hermann-Herder-Str. 10, D79104 FREIBURG
Tel.: +49 761/203-4652
Fax:  +49 761/203-4643