[OpenAFS-devel] 1.4.0 volserver crash on AIX 5.2; pthread issue?

Peter Somogyi psomogyi@gamax.hu
Tue, 22 Nov 2005 16:36:48 +0100


Hi Horst,

Thank you very much for the answer.

> I'm sure you played around with the options.
> the -p option looks kinda suspicious, and the udpsize.
> I think this has nothing to do with the pthread_mutex_lock problem.
You are right in the point: without params it's working (I've just tried...).
(So directories are OK.)

> I don't know how to crash pthread_mutex_lock if you provide a valid
> pointer.
I've seen that not only the first time under AIX.
(And I've already fixed a bug for openafs regarding pthread_mutex_destroy - not lock - which occured _only_ under AIX.)
And disassembling via gdb shows very obviously that it has stopped there.

So I will ask my bosses whether we really need those buggy options.

Peter

On Tuesday 22 November 2005 16.01, you wrote:
> On Nov 22, 2005, at 3:27 PM, Peter Somogyi wrote:
> > Hi,
> >
> > I've downloaded 1.4.0 binaries from www.openafs.org for AIX 5.2,
> > but volserver crashes.
>
> I don't know very much about this binaries, but I presume, their not
> built in any other way but by using the usual build process.
> Pretty much the same way I build mine.
>
> > Core was generated by `volserver'.
> > Program terminated with signal 6, Aborted.
> > #0  0xd005c604 in pthread_kill () from /usr/lib/libpthreads.a
> > (shr_xpg5.o)
> > (gdb) bt
> > #0  0xd005c604 in pthread_kill () from /usr/lib/libpthreads.a
> > (shr_xpg5.o)
> > #1  0xd005c08c in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5.o)
> > #2  0xd01eff34 in raise () from /usr/lib/libc.a(shr.o)
> > #3  0xd02102c8 in abort () from /usr/lib/libc.a(shr.o)
> > #4  0x100028f0 in AssertionFailed ()
> > #5  0x10002770 in vFSLog ()
> > #6  0x10037e00 in Log ()
> > #7  0x10000c34 in main ()
> >
> > I've disassembled it, and found that it crashes in the function:
> > src/util/serverLog.c: vFSLog, line 144:
> >     LOCK_SERVERLOG();
> > which is a macro for assert(pthread_mutex_lock(&serverLogMutex)==0).
>
> I don't know how to crash pthread_mutex_lock if you provide a valid
> pointer.
>
> > In main(): volmain.c, Line 485, Log("Starting AFS Volserver %s (%s)
> > \n",...
> >
> > bash-2.05b# cat /usr/afs/local/BosConfig
> > restarttime 11 0 4 0 0
> > checkbintime 3 0 5 0 0
> > bnode fs fs 1
> > parm /usr/afs/bin/fileserver -m 2 -spare 1048576 -L -udpsize
> > 1310720 -nojumbo -abortthreshold 0
> > parm /usr/afs/bin/volserver -p 16 -syslog -udpsize 1310720 -nojumbo
> > parm /usr/afs/bin/salvager -parallel 4 -syslog
> > end
>
> I'm sure you played around with the options.
> the -p option looks kinda suspicious, and the udpsize.
> I think this has nothing to do with the pthread_mutex_lock problem.
>
> > bash-2.05b# cat BosLog
> > Tue Nov 22 14:42:21 2005: Server directory access is okay
> > Tue Nov 22 14:42:24 2005: fs:vol exited on signal 6 (core dumped)
> > ...
> > Tue Nov 22 14:42:25 2005: BNODE 'fs' repeatedly failed to start,
> > perhaps missing executable.
> > Tue Nov 22 14:42:25 2005: fs:vol exited on signal 6 (core dumped)
> > Tue Nov 22 14:42:25 2005: BNODE 'fs' repeatedly failed to start,
> > perhaps missing executable.
> > Tue Nov 22 14:42:25 2005: fs:file exited with code 0
> > Tue Nov 22 14:42:25 2005: BNODE 'fs' repeatedly failed to start,
> > perhaps missing executable.
> >
> > BosLog, FileLog, SalvageLog exist in /usr/afs/logs.
> >
> > libc version:
> >   bos.rte.libc              5.2.0.60    C     F    libc Library
> >
> > By the way, I've already seen such a pthread_lock assertion at far
> > different place, also under AIX, very rarely (every 1-2 month).
> > But at the moment I'm always able to reproduce this error in
> > volserver.
> >
> > Any idea? Does anybody know about any pthread issue under AIX? Or
> > our libc version is wrong?
> > (Has anybody ever run 1.4.0 on AIX 5.2 ?)
>
> I run OpenAFS on AIX 5.2 but the cvs version, as always, not the
> stable one.
>
> BTW, did you check that it writes into /usr/afs/logs.
> It crashes if the directory for the logs doesn't exist.
> If it's compiled without transarc paths (which mine always is) the
> correct path would be /usr/local/var/openafs/logs.
>
> Horst