[OpenAFS-devel] OpenAFS on FreeBSD 14.1
Benjamin Kaduk
kaduk@mit.edu
Mon, 4 Nov 2024 20:16:06 -0800
On Mon, Nov 04, 2024 at 06:00:00PM +0000, Ben Huntsman wrote:
> Hi there!
> Has anyone looked at getting OpenAFS working on FreeBSD 14.1 lately? They've changed a bunch of stuff. I managed to get it to compile, and I can load the kernel module and start bosserver, but as soon as I run bos setcell, I get a kernel panic. In fact, any command which takes a -server argument causes a panic.
I don't have a FreeBSD-14 system up yet, no.
> Here's a backtrace from a dump:
>
> First of all, the stack backtrace while starting kgdb:
>
> ...
> KDB: stack backtrace:
> #0 0xffffffff80b7fbfd at kdb_backtrace+0x5d
> #1 0xffffffff80b32961 at vpanic+0x131
> #2 0xffffffff80b32823 at panic+0x43
> #3 0xffffffff80fff91b at trap_fatal+0x40b
> #4 0xffffffff80fff966 at trap_pfault+0x46
> #5 0xffffffff80fd6a48 at calltrap+0x8
> #6 0xffffffff832e3dfb at afs_syscall_call+0x1627
That suggests that some basic processing in the syscall stub is accessing
memory in ways it shouldn't, as if we were trying to dereference a
userspace pointer rather than copyin() the data it points to. But I did
not think that anything big had changed in FreeBSD that would cause us to
start breaking spontaneously...
> #7 0xffffffff832485c3 at afs3_syscall+0x89
> #8 0xffffffff8100073b at amd64_syscall+0x67b
> #9 0xffffffff80fd735b at fast_syscall_common+0xf8
> ...
>
> And the backtrace:
>
> (kgdb) backtrace
> #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
> #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405
> #2 0xffffffff80b324f7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523
> #3 0xffffffff80b329ce in vpanic (fmt=0xffffffff8115edb8 "%s", ap=ap@entry=0xfffffe005d6dfa10)
> at /usr/src/sys/kern/kern_shutdown.c:967
> #4 0xffffffff80b32823 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891
> #5 0xffffffff80fff91b in trap_fatal (frame=0xfffffe005d6dfaf0, eva=200) at /usr/src/sys/amd64/amd64/trap.c:952
> #6 0xffffffff80fff966 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>,
> ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760
> #7 <signal handler called>
> #8 0xffffffff832bb1d3 in rxi_FindIfnet () from /usr/vice/etc/libafs.ko
> #9 0x0000000000000000 in ?? ()
>
>
> rxi_FindIfnet is in src/rx/rx_kcommon.c. One thing that I noticed that jumps out at me is that on FreeBSD, the code calls CURVNET_SET. This same call and some similar ones are also present in a section of src/afs/afs_server.c that needed some changes due to removed calls in FreeBSD 14.1.
CURVNET_SET is something we do need to be doing and is FreeBSD-specific,
but keeping track of when to set/restore it is a place where we've had
issues in the past.
Looking at the specific call make me wonder if the global rx_socket is
actually initialized at this point.
> However, I'm much less familiar with FreeBSD than I am with AIX, so I would appreciate a pointer if I'm looking in the right direction or this is a red herring.
Getting the debug build working might be more fruitful than starting at a
coarse-grained backtrace or debug-via-printf.
> Debugging on this version of FreeBSD is also made more difficult as trying to build the OpenAFS tree with debug enabled results in many errors similar to this:
>
> usr/bin/ctfconvert -g -l openafs .libs/camellia.o ERROR: ctfconvert: rc = 1 Unsupported version [_dwarf_info_load(229)]
I am not sure whether the CTF support got any real exercise on FreeBSD, so
I would be inclined to suggest modifying src/config/cc-wrapper.in to just
exit early and see if that lets the build progress. I would want to get it
tracked down and fixed eventually, but it seems like it might be unrelated
to your proximate issues.
-Ben