[OpenAFS] OpenAFS Solaris 10 port

Jeff Woodward Jeffrey.B.Woodward@Dartmouth.EDU
Tue, 16 Mar 2004 22:30:46 -0500 (EST)


For anyone interested in a port of OpenAFS to Solaris 10, I would be happy
to share my experiences, code base, and binaries. Presently, I have a
source tree based on the OpenAFS-1.2.10 distribution that compiles both
the 32 and 64 bit kernel modules using the Sun compiler under Sparc Solaris10.
I have done preliminary testing of the client using the 64 bit kernel module
which seems to be working well (see open issues below); I have tested
most of the major client utilities. I have not yet tested the AFS server
functionality on this platform (nor have I patched for the known "date
roll-over bug" that exists in the 1.2.10 server code base).

For those who care, more details are included below. If early access to
this work is desired, please send me email directly. If the requests are
overwhelming, I will make them AFS and web accessible and repost this
list; otherwise, I will be posting "proper" changes against the CVS tree
to the devel list for inclusion in some future release of OpenAFS.

-Jeff Woodward
 Project Manager - Systems and Development
 The fMRI Data Center
 Dartmouth College



Overview of Current Work
------------------------
(*) Assigned SYS_NAME_ID_sun4x_510 to be 941 in src/config/afs_sysnames.h
(*) Created param.sun4x_510.h and param.sun4x_510_usr.h files in src/config
(*) Numerous changes to src/libafs/MakefileProto.SOLARIS.in to add
    sun4x_510 system type.
(*) Add includes for cred_impl.h to src/afs/sysincludes.h -- caveat this
    may not be the right thing to do; as it appears that cred_impl isn't
    suppose to be publicly exposed; however, it seems to be compiling
    and working for the moment but my guess is that it is subject to break
    in future Solaris releases.
(*) Hack to src/venus/Makefile.in to pickup sysname sun4x_510 in order
    to get the correct libraries for linking kdump.
(*) Method for traversing network interfaces has changed within the
    Solaris "ip" kernel module -- resulting in changes to
    src/afs/SOLARIS/osi_vfsops.c, src/afs/afs_server.c, and
    rx/SOLARIS/rx_knet.c
(*) tv_usec and tv_sec changed(?) effecting struct timeval -- resulting in
    changes to src/afs/afs_osi.h (I got away with typedef'ing
    osi_timeval_t as a plain old struct timeval rather than using the
    one included in the afs source with afs_int32 tv_sec and tv_usec typed
    members -- no claim that this was the right choice).
(*) Solaris 10 seems to have made somewhat extensive changes to the VFS
    interface. I am flying blind here as I don't have source code for
    Solaris 10, so I based my changes on the comments and changes in the
    sys/vfs.h and sys/vnode.h header files. These changes resulted in
    modifications to src/afs/SOLARIS/osi_vfsops.c and
    src/afs/SOLARIS/osi_vnodeops.c as well as src/afs/VNOPS/afs_vnop_read.c
    and src/afs/VNOPS/afs_vnop_write.c. A little bit of clean up is still
    needed here. Perhaps the most nebulous of the changes to VFS interface
    is the addition of the 'caller_context_t*' parameter to the vop_read
    and vop_write vnodeops which subsequently effects the VOP_READ and
    VOP_WRITE macros. I am not an avid kernel hacker, so I don't know if
    this construct exists in other operating systems, nor do I know what
    necessitated this change on Sun's part. Nonetheless, I updated the
    afs_vmread and afs_vmwrite function definitions accordingly but I
    didn't add any code to utilize the caller_context. Likewise, I pass a
    caller_context_t* parameter to VOP_READ and VOP_WRITE but I have no
    idea "why" other than it is now part of the API. For now it seems to
    be working (see also the open issues).
(*) Solaris 10 seems to have made some minor (?) changes to the sockfs
    kernel interface. In particular, the sounbind() function is gone! This
    resulted in changes to src/rx/SOLARIS/rx_knet.c. See open issues
    below.


Open Issues
-----------
(*) libtermlib.a - Solaris 10 base no such library -- for now, it was
    copied from a Solaris 9 system, but work should be done to eliminate
    its dependency.
(*) vfsck is not compiling (so I removed it from Makefile) -- I am not
    ready to test the server; in addition, I tend to configure servers with
    the NAMEI interface. In short, vfsck is a very low priority for me.
(*) Since sounbind() is missing, the socket for RX on port 7001 remains
    "mostly" bound after afs is shutdown and /afs is dismounted. I have
    tried various strategies such as "shutting down" and "closing" the
    socket, but even with that, it remains "bound" (as can be see in the
    output of netstat -an). Attempting to restart afs without rebooting
    either: 1) causes RX to fail to start preventing afs from starting,
    2) appearance of everything starting, but I/O errors accessing /afs,
    3) panic'ing the system. This may not be a big deal to most people
    since it is only an issue if you attempt to stop afs and restart it
    without rebooting. Many people claim that that never works anyway...I
    see no reason for it to not work if I can get the RX port unbind'ed.
    I have an email into the Solaris 10 kernel team for direction.
(*) I plan to email the Solaris 10 kernel team asking for more information
    regarding the caller_context_t* parameters in vop_read/vop_write.
(*) proper conditional compilation of the changes noted above for the
    Solaris 510 platform - currently, my source tree is not backwards
    compatible with prior versions of Solaris...

Everything else is stuff that I have explicitly not tested (some I may,
others I may never test :-)

(*) HAVE NOT TESTED THE AFS SERVER COMPONENT - on my list of TO DOs
(*) PAM module - not yet tested.
(*) 32 bit sparc kernel module is untested (I don't have any sparcs that
    I boot with a 32 bit kernel -- hint: I am not likely to ever test
    this).
(*) x86 Solaris - have not attempted necessary changes for x86 arch. Code
    changes are *probably* good to go, but the changes to Makefiles, param
    files, and afs_sysname.h have not been attempted. Not sure that I
    have Solaris 10 for x86 installed [yet]...
(*) nfs translator - not tested (I have never attempted to use the nfs
    translator in any version of OpenAFS for any platform -- hint: I am
    not likely to ever test this).
(*) memory cache - not yet tested but will do soon.
(*) dynamic roots - tested only briefly (seemed to work)
(*) there are probably a dozen other switches/options that I don't
    normally use or would think to test.


Next Steps
----------
(*) Check out latest development branch from CVS and integrate changes
    with "proper" conditional compilation to maintain backwards
    compatibility.
(*) Integrate suggestions from the Solaris 10 kernel team (if any are
    received).
(*) Test the AFS server component.
(*) It would be nice if vfsck would at least compile; however, I am not
    eager to claim that it would work even if I got it to compile :-)
    Hopefully somebody "more qualified" than I will help out here...
(*) Post patch files to the devel list for evaluation and [hopefully]
    adoption.