[OpenAFS] Re: openafs server on freebsd 8.2 amd64: bosserver coredump

Mark mark@nl.simpc.com
Mon, 04 Apr 2011 10:39:37 +0200

On 01-04-11 23:19, Andrew Deason wrote:
> On Fri, 1 Apr 2011 08:56:35 -0400
> Derrick Brashear <shadow@gmail.com> wrote:
>> you're almost certainly better off for FreeBSD using 1.6.0pre4.
> It would still be nice to know what's going on here, if possible.
> Mark, do you not see anything in BosLog (or BosLog.old, etc) when this
> happens? There should be a panic string somewhere that says specifically
> why we're aborting. And could you do this after configuring with
> --enable-debug ?

Can do, its not in production yet. I did install 1.6.0pre4 on it, and that one runs fine btw.
Removed 1.6.0pre4 and configured with --enable-debug:
./configure --enable-transarc-paths --enable-namei-fileserver --with-afs-sysname=amd64_fbsd_80 --enable-largefile-fileserver --disable-pam --enable-supergroups --with-krb5-conf=/usr/bin/krb5-config --disable-kernel-module --enable-debug

Ran locally with localauth 'bos listkeys localhost -localauth' 20 times, no  problem
Ran it remote 2 times and it crashed on 2nd attempt, remote got this error:
bos: communications failure (-1) error encountered while listing keys

On server there is noting in BosLog other then
Mon Apr  4 10:16:16 2011: Server directory access is okay

Backtrace from the generated bosserver.core:

(gdb) bt
#0  0x000000080077afcc in kill () from /lib/libc.so.7
#1  0x0000000800779dcb in abort () from /lib/libc.so.7
#2  0x000000000041389b in osi_Panic (msg=Variable "msg" is not available.) at rx_user.c:225
#3  0x000000000041e3a4 in AllocPacketBufs (class=Variable "class" is not available.) at rx_packet.c:349
#4  0x000000000041e465 in rxi_AllocDataBuf (p=0x800a4b600, nb=7076, class=Variable "class" is not available.) at rx_packet.c:514
#5  0x000000000041ed0b in rxi_ReadPacket (socket=3, p=0x800a4b600, host=0x800a61f60, port=0x800a61f66) at rx_packet.c:1419
#6  0x00000000004145dc in rxi_ListenerProc (rfds=0x800a63000, tnop=0x800a61fbc, newcallp=0x800a61fb0) at rx_lwp.c:296
#7  0x0000000000414815 in rx_ListenerProc (dummy=Variable "dummy" is not available.) at rx_lwp.c:336
#8  0x0000000000423f14 in Create_Process_Part2 ()

Did same test with remote bos getlog on this small BosLog with just 1 line, crashed also on 2nd call, similar backtrace from core file.

Ran it with bigger log file (copied the config.log to BosLog and did a get BosLog.old after starting bosserver), this one crashed during the log transfer. Console printed about 1417 bytes from the file. Again same backtrace from the core.
If I run the getlog local on the server with localauth it works fine.

>>> since bos listkeys localhost -localauth crashed bosserver, but other
>>> bos commands worked. After recreating it with asetkey on the 64bit
>>> system listkeys locally works, but doing bos over the network often
>>> crashes the server still on the first command or else the next.
> That's a little odd, since that crash is at a much lower level than
> which RPC you're running. It may just have to do with how much data is
> involved going over the wire for the command. It might be interesting to
> see if "bos getlog" also crashes, if you try it on a log that has a
> bunch of stuff in it.
I cannot reproduce this any more with the old key file, so must have been something else (might have ran it without localauth or mixed up something.

Mark Huijgen