[OpenAFS-port-freebsd] Re: FreeBSD 5-current client work....

Garrett Wollman wollman@khavrinen.lcs.mit.edu
Thu, 25 Sep 2003 13:39:20 -0400 (EDT)


<<On Thu, 25 Sep 2003 12:21:37 -0400, Jim Rees <rees@umich.edu> said:

> You should join the port-freebsd@openafs.org list, and post there instead of
> here, at least until the client is working and commited to cvs.  Feel free
> to send diffs directly to me if you want.  If you can clear up the 4.x vs
> 5.x issue for me (see below) I'll start commiting.

Moving the thread over there....

>   - It is essential that libafs.ko be compiled using the kernel's actual
>   option headers (opt_global.h in particular) and not fake ones.

> Agreed, but we need to find a better way.  In particular, I don't understand
> "-I/sys/i386/compile/AFS."  Where does that come from?  At least use
> @BSD_KERNEL_PATH@ instead of /sys.  See MakefileProto.OBSD.in for an
> example.

The issue I'm trying to solve is not finding the kernel sources, it's
finding the kernel compilation directory, where important header files
like vnode_if.h and opt_global.h can be found.  The former can (and
ultimately should) be generated automatically; the latter *must* match
the running kernel if there is to be any hope of being able to debug
the client module.  For non-debugging builds, it's probably safe to
assume that the options will be the same as in the GENERIC kernel.

>   - Most of the necessary fixes were simply telling the FreeBSD port to
>   use the same serup as OpenBSD.  It will never be possible (or
>   sensible) for AFS to ``subclass'' struct vnode in FreeBSD.

> But that's not true for 4.x, right?

It is true to 4.x, too, but the problems are sufficiently papered over
that it will work on 4.x because 4.x doesn't have the same level of
internal consistency and invariant checking that 5.x has, and the data
structures are such that you can more easily get away with not
initializing them properly.  So 4.x probably worked only
serendipitously.  Conceptually, the models are the same for 4.x and
5.x except that 5.x does more locking, so IMAO the 4.x and 5.x clients
should behave the same way with respect to vnodes.

> Your diff changes this for 4.x, and introduces some other things
> (like curthread) that make me think it won't work on 4.x.  I don't
> have 4.x here to test with.

It's possible that I didn't generate enough #ifdef spaghetti to avoid
breaking 4.x; this is an integration issue, and I'm still in the
hack/run/debug process.

>   In order to do that, while avoiding race conditions, I
>   turned afs_global_lock into afs_global_mtx.

> Why?  As you found out, this introduces problems.  Why not leave it as-is?

Because there's (at least the appearance of) a horrible race condition
in several functions in osi_sleep.c if it's not.  Even better, making
it a mutex gives better debugging (thanks to WITNESS) and allows us to
use the kernel's built-in synchronization primitives (like condition
variables) rather than having to kluge them.

It's clear that the old code really actually wants a mutex and not a
reader-writer lock in any case.  In 4.x, lockmgr() was all we had.
Now we have full suite of mutexes, condvars, shared/exclusive locks,
and semaphores, in addition to the old-style lockmgr() locks.

>   - It seems as if, with a decent kernel memory allocator, almost all of
>   afs_osi_alloc() could be thrown out.

> The traditional allocator is the set of osi_Alloc routines in
> afs_osi_alloc.c.  These are there because in the old days kernels didn't
> have memory allocators at all.

> The "new" way is to call afs_osi_Alloc(), which usually turns into the
> native kernel allocator for whatever system you're on.  Correct me if I'm
> wrong.

Ultimately, afs_osi_Alloc() calls AFS_KALLOC() after mangling its
arguments, which expands into malloc(size, MT_AFS, M_WAITOK) on
FreeBSD.  On Linux, though, it just calls osi_linux_alloc(x, 1) -- and
(this is the important thing that I did not notice last night -- that
second argument tells osi_linux_alloc() that it's safe to drop the
global lock!  Now that I'm clear on this I think I'll be able to make
some better progress tonight.

The concern about the funky allocators in afs_osi_alloc.c is that it's
generally a bad idea for a subsystem to maintain its own freelists,
since that is memory that, once allocated, can never be reclaimed for
another purpose.  I'm guessing that in traditional (4BSD)
implementations, ``small'' allocations were done in mbufs, ``medium''
allocations used mbuf clusters, and ``large'' used I/O buffers,
correct?

-GAWollman