[OpenAFS-devel] OpenAFS on FreeBSD 14.1

Benjamin Kaduk kaduk@mit.edu
Mon, 4 Nov 2024 20:16:06 -0800


On Mon, Nov 04, 2024 at 06:00:00PM +0000, Ben Huntsman wrote:
> Hi there!
>    Has anyone looked at getting OpenAFS working on FreeBSD 14.1 lately?  They've changed a bunch of stuff.  I managed to get it to compile, and I can load the kernel module and start bosserver, but as soon as I run bos setcell, I get a kernel panic.  In fact, any command which takes a -server argument causes a panic.

I don't have a FreeBSD-14 system up yet, no.

>    Here's a backtrace from a dump:
> 
> First of all, the stack backtrace while starting kgdb:
> 
> ...
> KDB: stack backtrace:
> #0 0xffffffff80b7fbfd at kdb_backtrace+0x5d
> #1 0xffffffff80b32961 at vpanic+0x131
> #2 0xffffffff80b32823 at panic+0x43
> #3 0xffffffff80fff91b at trap_fatal+0x40b
> #4 0xffffffff80fff966 at trap_pfault+0x46
> #5 0xffffffff80fd6a48 at calltrap+0x8
> #6 0xffffffff832e3dfb at afs_syscall_call+0x1627

That suggests that some basic processing in the syscall stub is accessing
memory in ways it shouldn't, as if we were trying to dereference a
userspace pointer rather than copyin() the data it points to.  But I did
not think that anything big had changed in FreeBSD that would cause us to
start breaking spontaneously...

> #7 0xffffffff832485c3 at afs3_syscall+0x89
> #8 0xffffffff8100073b at amd64_syscall+0x67b
> #9 0xffffffff80fd735b at fast_syscall_common+0xf8
> ...
> 
> And the backtrace:
> 
> (kgdb) backtrace
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405
> #2  0xffffffff80b324f7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523
> #3  0xffffffff80b329ce in vpanic (fmt=0xffffffff8115edb8 "%s", ap=ap@entry=0xfffffe005d6dfa10)
>     at /usr/src/sys/kern/kern_shutdown.c:967
> #4  0xffffffff80b32823 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891
> #5  0xffffffff80fff91b in trap_fatal (frame=0xfffffe005d6dfaf0, eva=200) at /usr/src/sys/amd64/amd64/trap.c:952
> #6  0xffffffff80fff966 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>,
>     ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760
> #7  <signal handler called>
> #8  0xffffffff832bb1d3 in rxi_FindIfnet () from /usr/vice/etc/libafs.ko
> #9  0x0000000000000000 in ?? ()
> 
> 
> rxi_FindIfnet  is in src/rx/rx_kcommon.c.  One thing that I noticed that jumps out at me is that on FreeBSD, the code calls CURVNET_SET.  This same call and some similar ones are also present in a section of src/afs/afs_server.c that needed some changes due to removed calls in FreeBSD 14.1.

CURVNET_SET is something we do need to be doing and is FreeBSD-specific,
but keeping track of when to set/restore it is a place where we've had
issues in the past.

Looking at the specific call make me wonder if the global rx_socket is
actually initialized at this point.

> However, I'm much less familiar with FreeBSD than I am with AIX, so I would appreciate a pointer if I'm looking in the right direction or this is a red herring.

Getting the debug build working might be more fruitful than starting at a
coarse-grained backtrace or debug-via-printf.

> Debugging on this version of FreeBSD is also made more difficult as trying to build the OpenAFS tree with debug enabled results in many errors similar to this:
> 
> usr/bin/ctfconvert -g -l openafs .libs/camellia.o ERROR: ctfconvert: rc = 1 Unsupported version [_dwarf_info_load(229)]

I am not sure whether the CTF support got any real exercise on FreeBSD, so
I would be inclined to suggest modifying src/config/cc-wrapper.in to just
exit early and see if that lets the build progress.  I would want to get it
tracked down and fixed eventually, but it seems like it might be unrelated
to your proximate issues.

-Ben