[OpenAFS-devel] ptserver coredumps (1.5.78 on FreeBSD 8.1)

u-openafsdev-njsf@aetey.se u-openafsdev-njsf@aetey.se
Wed, 8 Dec 2010 14:22:42 +0100


FWIIW

Setting up a new cell,
following best practices except the traditional "cell == lowercase(REALM)"
(/usr/local/etc/openafs/server/krb.conf contains the realm name properly)

Compiling openafs from source 1.5.78 on

FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

./configure --enable-namei-fileserver --enable-supergroups

the underlying file system(s) is zfs.

(as a side note, the client can start with memcache but is broken/unusable)

The server looks to run ok (it is possible to create volumes and
manilulate the account database, tested pts remotely with tokens also).

Unfortunately at an authenticated file system access (accessing
/afs/<cell> with tokens as a member of system:administrators, no acls for
root.cell have been reset yet) ptserver dumps core. The access attempt
eventually times out.

/usr/local/var/openafs/logs/BosLog says:
---------------------
cat /usr/local/var/openafs/logs/BosLog
Wed Dec  8 10:02:08 2010: Core limits now -1 -1
Wed Dec  8 10:02:08 2010: Server directory access is okay
Wed Dec  8 10:03:07 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:04:06 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:05:05 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:06:04 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:08:28 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:09:08 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:09:56 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 10:10:55 2010: ptserver exited on signal 11 (core dumped)
 ...
---------------------

Running with MALLOC_OPTIONS=Z does not change this.

---------------------
# gdb /usr/local/libexec/openafs/ptserver /usr/local/var/openafs/logs/ptserver.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
Core was generated by `ptserver'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0000000000414dd0 in IDCmp ()
(gdb) bt
#0  0x0000000000414dd0 in IDCmp ()
#1  0x0000000000414f03 in IDCmp ()
#2  0x0000000000408303 in pr_rxstat_userok ()
#3  0x00000000004086e2 in pr_rxstat_userok ()
#4  0x000000000040d950 in pt_mywrite ()
#5  0x00000000004122ca in PR_ExecuteRequest ()
#6  0x000000000043c9ef in rxi_KeepAliveEvent ()
#7  0x00000000004332dd in rx_ServerProc ()
#8  0x0000000000444054 in xdr_array ()
#9  0xf7f6f5f4f3f2f1f0 in ?? ()
#10 0xfffefdfcfbfaf9f8 in ?? ()
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
 ...
---------------------

Ok, I wanted to give FreeBSD an extra chance.

Compiled 1.4.12.1 which needed some tweaking to go through
but in the end I see:
---------------------
# cat /usr/local/var/openafs/logs/BosLog
Wed Dec  8 13:38:11 2010: Server directory access is okay
Wed Dec  8 13:38:34 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:38:36 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:38:44 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:38:54 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:04 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:14 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:25 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:36 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:47 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:56 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:59 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:39:59 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:06 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:11 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:16 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:23 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:26 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:36 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:36 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:47 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:48 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:40:58 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:41:01 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:41:09 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:41:15 2010: ptserver exited on signal 11 (core dumped)
Wed Dec  8 13:41:18 2010: ptserver exited on signal 11 (core dumped)
 ...
---------------------
which seem to be a different issue:
---------------------
# gdb /usr/local/libexec/openafs/ptserver /usr/local/var/openafs/logs/ptserver.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
Core was generated by `ptserver'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0000000000415570 in free_map ()
(gdb) bt
#0  0x0000000000415570 in free_map ()
#1  0x00000000004156a3 in bic_bitmap ()
#2  0x0000000000407f13 in GetListSG2 ()
#3  0x00000000004082f2 in GetList ()
#4  0x000000000040b9ba in getCPS ()
#5  0x000000000040bded in SPR_GetCPS ()
#6  0x0000000000412a8a in PR_ExecuteRequest ()
#7  0x0000000000439e0a in rxi_ServerProc ()
#8  0x000000000043187d in rx_ServerProc ()
#9  0x0000000000441374 in Create_Process_Part2 ()
#10 0xf7f6f5f4f3f2f1f0 in ?? ()
#11 0xfffefdfcfbfaf9f8 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
 ...
---------------------

MALLOC_OPTIONS=Z does not help either.

Ok, somebody has presumably applied all necessary tweaks and got this to work:
 http://wiki.freebsd.org/afs-server
which wants to compile 1.4.7

Alas, the compilation ends with
---------------------
+ cc -shared -o libafssetpag.so.1.0 picobj/setpag.o picobj/glue.o syscall.o
/usr/bin/ld: picobj/setpag.o: relocation R_X86_64_32 can not be used when making a shared object; recompile with -fPIC
picobj/setpag.o: could not read symbols: Bad value
*** Error code 1

Stop in /usr/ports/net/openafs-server/work/openafs-1.4.7/src/sys.
*** Error code 1

Stop in /usr/ports/net/openafs-server/work/openafs-1.4.7/src/sys.
*** Error code 1

Stop in /usr/ports/net/openafs-server/work/openafs-1.4.7.
*** Error code 1

Stop in /usr/ports/net/openafs-server/work/openafs-1.4.7.
*** Error code 1

Stop in /usr/ports/net/openafs-server/work/openafs-1.4.7.
*** Error code 1

Stop in /usr/ports/net/openafs-server.
---------------------

I get a feeling that FreeBSD is hardly well supported, not even as
a server platform? Fime with me but I will miss zfs if I have
to switch to Linux.

May be it's me who is missing something of importance but I did what
I could :)

Regards,
Rune