[OpenAFS-devel] Refactoring the Solaris libafs code base

Marcus Watts mdw@umich.edu
Sun, 31 Dec 2006 22:11:35 -0500


> On Sat, 30 Dec 2006, Marcus Watts wrote:
> 
> > Kernel side building is not nearly as standardized; there is no
> > completely portable way to handle includes.  Since the kernel isn't
> > (usually) linked against libc, many functions that are present in
> > userland code aren't present inside the kernel, so files such as
> > stdio.h simply don't make sense there.
> 
> Excuse my ignorance
> 
> The libc argument to me would make a lot more sense if the kernel module
> was a shared object or used shared libraries. But since they are
> static, and don't include shared libs the symbols should already be
> included at linking which gives credence to Mr Allbury's suggestion of
> just dropping the guards altogether.

This doesn't have anything to do with static vs. dynamic linking.  In
fact, it doesn't have anything to do with linking at all, since even in
userland code, it makes no difference where the functions declared in
<stdio.h> come from, or exactly how early or late those entry points
are bound to load addresses.  Although it doesn't matter, on nearly
every modern platform, the kernel provides some form of platform-specific
kernel shared object support ("kernel modules" or "kernel extension") which
openafs uses to incorporate the cache manager into the system as
part of the filespace.

What really matters here is that the kernel is a "standalone" program.
That is, the system services that libc expects to see (system calls,
etc.) do not exist and in their place is pretty much the naked
hardware.  In a standalone program, the *entire* function name space is
available, and what functions are provided does not have any necessary
connection to libc *at all*.  In the unix kernel code, entry points
(for instance, getlogin()) do not exist, other entry points (signal())
may exist but perform entirely different functions (way back in 7th
edition unix, the kernel signal() function did something rather
analogous to what the present-day pthread_cond_broadcast() function
does.  In the Linux kernel, things are yet again different.
And then there's "user mode linux", which is even more strange.

> 
> The netutils.c argument Mr. Altman gave would make a lot more sense if
> afsd was actually a kernel module, since it just interfaces with the
> kernel module and already dynamically linked against libc.., it doenst
> make quite as much sense to me. But if I am wrong...

In unix/linux, afsd is a simple userland program which makes special
system calls into the already existing cache manager.  Effectively,
it's supplying some startup parameters for the cache manager,
and donating several points of control or "threads" to the cache manager.
For systems that do dynroot, one of those threads is doing kernel "upcalls"
to perform DNS lookups.  (Some kernels, including solaris and aix, have
another way inside the kernel to specifically create kernel threads with no user
context.)

In windows, afsd is the userland cache manager - an entirely different
monster.  For openafs, afsd as distributed hooks into the system as a
"smb" fileserver, which creates some problems.  Several existing
experimental implementaions of afsd on windows are instead implemented
as "kernel IFS stubs / userland afsd cache manager".  Arla on linux
works the same way.  For all these cases, with the cache manager in
afsd, it is in userland so need not have special kernel headers.

Jeff Altman has proposed doing a windows cache manager that works more
like openafs presently does on linux/unix.  Basically, he's proposing
to write code using the same runtime environment as a windows NT device
driver.  The windows "kernel shared object" is a dll - built just like
userland code.  However, in real userland programs on windows, programs
typically go through 3-4 layers of runtime libraries to provide libc
functionality, graphics calls, win32 runtime calls, & posix calls.
None of those layers are present, only a slightly mutated version of
the underlying NT runtime environment is present.  What that means is
that, even though the windows object linking convention is virtually
identical between kernel & userland code, Jeff's going to need kernel
include files and source changes that are vastly different from
anything that's in the openafs source today.  This is an ambitious
project, and it's not going to get done immediately.  The time to worry
about this is when Jeff has something that works and it's time to merge
all the bits back together.

> 
> netutils.c  does have:
> #include <stdio.h>
> #ifdef HAVE_STRING_H
> #include <string.h>
> #endif
> #ifdef KERNEL
> (paraphased)
> 
> Which isn't consistant with not including userland symbols in kernel code.

netutils.o isn't built for the kernel.  That's a good thing, because
it uses fopen/fgets/fclose which are part of stdio.  It is used in
libuafs - so KERNEL in that file really means UKERNEL.

> Since the kernel modules are statically linked, I could drop the guards as
> Mr Allbury suggested to clean up the code and thusly I submitted a patch
> that did just that.
> 
> Given what you said for consistancy sake... I could leave the guards on
> for string.h, and put guards on stdio.h, stdlib.h, etc. and then in
> afs/sysincludes.h for say string.h

src/afs/sysincludes.h - this is kernel code.  There should be no
occurrence of HAVE_STRING_H - and you should not touch the logic
that includes linux/string.h in linux, or string.h in darwin.

src/afs/UKERNEL/sysincludes.h
In my copy at least it includes string.h with no ifdef guard.

> at the top add:
> #ifdef HAVE_STRING_H
> #undef HAVE_STRING_H
> #endif

Definitely not.  Kernel code should not know or care if HAVE_STRING_H is set.

> 
> and in the solaris ifdef put
> #include <sys/systm.h>
> 
> (which has symbols for a lot of the string functions, and I am not sure
> about other platforms.. Actually the symbols in string.h are supposedly
> safe for the kernel on solaris or at least that is my brief
> understanding and sys/string.h on linux just includes string.h (or at
> least the 2.6 kernel..)

solaris, linux & other kernel environments vary greatly.  You should never
assume that what's true for one is even vaguely true for any other.
In many systems, the include files have unobvious inclusions of each other.
For instance, in solaris, including <sys/exec.h> includes <sys/systm.h>.
Openafs does this too.  Look at the "ifdef KERNEL" glop that goes into header
files spit out by rxgen.  What's worse, on some architectures, many
kernel include files lack the usual guard against including multiple
times, so if you include them more than once, you can actually break
things.  So, if the solaris build complains about undefined string
functions, including <string.h> for solaris *may* be appropriate.  In
other environments it may be precisely wrong.  Don't change things
because you think it ought to be needed.  Change things because you
actually see problems, or at least warnings.  If you change things, *test*.

> 
> Im not trying to be an ass, I am just trying to get some understanding
> and try to get some consistancy so it isn't quite as hard of a maze to
> wallow through.

Yes.  It is complicated.  Worse yet, some of the inconsistency is
unavoidable.

> 
> Sean

				-Marcus