[OpenAFS-devel] 1.4.0 volserver crash on AIX 5.2; pthread issue?

Peter Somogyi psomogyi@gamax.hu
Tue, 22 Nov 2005 15:27:50 +0100


Hi,

I've downloaded 1.4.0 binaries from www.openafs.org for AIX 5.2, but volserver crashes.

Core was generated by `volserver'.
Program terminated with signal 6, Aborted.
#0  0xd005c604 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5.o)
(gdb) bt
#0  0xd005c604 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5.o)
#1  0xd005c08c in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5.o)
#2  0xd01eff34 in raise () from /usr/lib/libc.a(shr.o)
#3  0xd02102c8 in abort () from /usr/lib/libc.a(shr.o)
#4  0x100028f0 in AssertionFailed ()
#5  0x10002770 in vFSLog ()
#6  0x10037e00 in Log ()
#7  0x10000c34 in main ()

I've disassembled it, and found that it crashes in the function:
src/util/serverLog.c: vFSLog, line 144:
    LOCK_SERVERLOG();
which is a macro for assert(pthread_mutex_lock(&serverLogMutex)==0).
In main(): volmain.c, Line 485, Log("Starting AFS Volserver %s (%s)\n",...

bash-2.05b# cat /usr/afs/local/BosConfig
restarttime 11 0 4 0 0
checkbintime 3 0 5 0 0
bnode fs fs 1
parm /usr/afs/bin/fileserver -m 2 -spare 1048576 -L -udpsize 1310720 -nojumbo -abortthreshold 0
parm /usr/afs/bin/volserver -p 16 -syslog -udpsize 1310720 -nojumbo
parm /usr/afs/bin/salvager -parallel 4 -syslog
end

bash-2.05b# cat BosLog
Tue Nov 22 14:42:21 2005: Server directory access is okay
Tue Nov 22 14:42:24 2005: fs:vol exited on signal 6 (core dumped)
...
Tue Nov 22 14:42:25 2005: BNODE 'fs' repeatedly failed to start, perhaps missing executable.
Tue Nov 22 14:42:25 2005: fs:vol exited on signal 6 (core dumped)
Tue Nov 22 14:42:25 2005: BNODE 'fs' repeatedly failed to start, perhaps missing executable.
Tue Nov 22 14:42:25 2005: fs:file exited with code 0
Tue Nov 22 14:42:25 2005: BNODE 'fs' repeatedly failed to start, perhaps missing executable.

BosLog, FileLog, SalvageLog exist in /usr/afs/logs.

libc version:
  bos.rte.libc              5.2.0.60    C     F    libc Library

By the way, I've already seen such a pthread_lock assertion at far different place, also under AIX, very rarely (every 1-2 month).
But at the moment I'm always able to reproduce this error in volserver.

Any idea? Does anybody know about any pthread issue under AIX? Or our libc version is wrong?
(Has anybody ever run 1.4.0 on AIX 5.2 ?)

-- 
Peter Somogyi
Software Developer, Gamax Ltd.
1114 Budapest, Bartok B. u 15/d
Tel.: +36-1-381-0544
e-mail: psomogyi@gamax.hu