[OpenAFS-devel] 1.3.79 on AIX 5.2, system dump when using token

Niklas Edmundsson Niklas.Edmundsson@hpc2n.umu.se
Thu, 10 Mar 2005 12:56:39 +0100 (MET)


On Fri, 25 Feb 2005, Michael Niksch wrote:

>> Michael: You might want to try my patch and see how much things improve for 
>> you.
>
> As far as I can tell, the machine dies just as quickly as it did without the 
> patch. The kernel dump also looks similar.

That's odd, it changed things considerably for me on a 32bit 
machine running AIX 5.1...

I enabled kernel memory debugging (bosdebug -M ; bosboot -a ; reboot) 
in order to try to further pinpoint the problem, and these are my 
findings:

* The machine still doesn't dump when I use my "large stack"-patch
   on 1.3.78 (it should dump if the xmalloc debug thingie detects an error).
* Going back to an unpatched 1.3.78 it dumps with the xmalloc debug
   message "A program has tried to access freed xmalloc memory". In
   both cases the crash occurs after having obtained an AFS token and
   then trying to access AFS using that token. The "kdb stat" output
   from the dumps are at the bottom of this post.

Does this give a hint on what's wrong? I'm wading through the source 
at random not finding anything obvious. All those #ifdefs makes the 
thing rather hard to read :/

Also, I'm no kdb guru and I haven't found any good howtos either. If 
anyone knows how to extract more useful info out of the thing please 
holler...

Dump 1:
------------------8<-------------------------
(0)> stat
SYSTEM_CONFIGURATION:
POWER_RS2 machine with 1 cpu(s)  (32-bit registers)

SYSTEM STATUS:
sysname... AIX
nodename.. n11
release... 1
version... 5
machine... 000030638100
nid....... 00306381
time of crash: Thu Mar 10 11:01:48 2005
age of system: 11 min., 50 sec.
xmalloc debug: enabled
Debug kernel error message: A program has tried to access freed xmalloc memory.
Address at fault was 0x3DC0E000

CRASH INFORMATION:
CPU 0 CSA 2FF3B400 at time of crash, error code for LEDs: 30000000
pvthread+005B80 STACK:
[08DBC714]memset+000054 ()
[08D99EE8]rxi_Alloc+000140 (00002F34)
[08DE979C]rxkad_NewClientSecurityObject+000080 (00000000, 307DFA24, 00000001, 00000030,
    307DF9C0) 
[08DD0BFC]afs_ConnBySA_7_5+000398 (??, ??, ??, ??, ??, ??, ??, ??)
[08DD0814]afs_Conn+0002A0 (??, ??, ??)
[08DF4B54]afs_DoBulkStat+000BC8 (3D8A68F8, 00000600, 2FF3A8C0)
[08DF359C]afs_lookup+000ED0 (3D8A68F8, 2FF3AB48, 2FF3AB44, 35B46600)
[08DB2CDC]afs_gn_lookup+00004C (3D8A68F8, 2FF3AB44, 2FF3AB48, 00000082,
    00000000, 35B46600)
[08DACFD0]vn_lookup+00009C (3D8A68F8, 2FF3AB44, 2FF3AB48, 00000082,
    00000000, 35B46600)
[002EEE2C]vnop_lookup+000018 (??, ??, ??, ??, ??, ??)
[002C6E74]lookuppn+000494 (??, ??, ??, ??, ??, ??)
[002C7390]lookupname_cur+000090 (??, ??, ??, ??, ??, ??, ??)
[003295F0]statx+000234 (20003F38, 2FF21708, 00000080, 00000009)
[00003A50].sys_call+000000 ()
Not a valid VMM address @ D01E469C
------------------8<-------------------------

Dump 2:
------------------8<-------------------------
(0)> stat
SYSTEM_CONFIGURATION:
POWER_RS2 machine with 1 cpu(s)  (32-bit registers)

SYSTEM STATUS:
sysname... AIX
nodename.. n11
release... 1
version... 5
machine... 000030638100
nid....... 00306381
time of crash: Thu Mar 10 11:38:18 2005
age of system: 34 min., 16 sec.
xmalloc debug: enabled
Debug kernel error message: A program has tried to access freed xmalloc memory.
Address at fault was 0x3453B000

CRASH INFORMATION:
CPU 0 CSA 2FF3B400 at time of crash, error code for LEDs: 30000000
pvthread+005280 STACK:
[09E50714]memset+000054 ()
[09E2DEE8]rxi_Alloc+000140 (00002F34)
[09E7D79C]rxkad_NewClientSecurityObject+000080 (00000000, 307DFAE4, 00000001, 00000030,
    307DF7C0) 
[09E64BFC]afs_ConnBySA_7_5+000398 (??, ??, ??, ??, ??, ??, ??, ??)
[09E64814]afs_Conn+0002A0 (??, ??, ??)
[09E629FC]afs_FetchStatus+000054 (??, ??, ??, ??)
[09E5FEA4]afs_GetVCache+000478 (??, ??, ??, ??)
[09E47F4C]afs_root_nolock+0000E0 (307BAA10, 2FF3AB44)
[09E47A9C]afs_root+000074 (307BAA10, 2FF3AB44)
[09E41E18]vfs_root+000084 (307BAA10, 2FF3AB44, 35A86400)
[003241D0]vfs_root+000018 (??, ??, ??)
[002C6FE8]lookuppn+000608 (??, ??, ??, ??, ??, ??)
[002C7390]lookupname_cur+000090 (??, ??, ??, ??, ??, ??, ??)
[003295F0]statx+000234 (2FF22CF7, 2FF21A08, 00000080, 00000009)
[00003A50].sys_call+000000 ()
Not a valid VMM address @ D01E469C
------------------8<-------------------------

/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke@hpc2n.umu.se
---------------------------------------------------------------------------
  "In English, Data." - Crusher
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=