[OpenAFS-devel] Solaris 10 predicament update
Dale Ghent
daleg@umbc.edu
Sat, 8 Sep 2007 19:46:54 -0400
I'm using this email to report on the problem, what I've found, and
lay out what our options are.
Background
With the advent of Solaris 10 8/07 (aka s10u4), the internal Private
kernel interfaces AFS used to access network interface properties
changed due to the integration of the pfhooks/netstack feature.
Specifically, a argument was added to the ill_* functions to
accommodate the netstack changes. This aspect negates their use vis a
vis maintaining AFS driver binary compatibility across all
permutations of the Solaris kernel.
Situation
The AFS driver code uses the ILL_* macros and functions (defined in
<inet/ip.h>) to walk a list of network interfaces and, as is the case
in SetServerPrefs() in src/afs/afs_server.c, pick the best interface
to bind to in order to talk to the AFS server holding the cell's root
volume. They are also used in src/rx/SOLARIS/rx_knet.c to gather MTU
settings of the interface a rx packet was received on, and uses the
retrieved value to adjust the RX UDP packet size to prevent
fragmentation.
My research has concluded that there are no straight-forward Public
interfaces in the Solaris kernel which exist all the way back to
Solaris 10 FCS. Also, there are no Private interfaces which directly
address our needs and are stable back to Solaris 10 FCS.
What to do?
There are a few alternatives we can consider, and I'd like to present
them for discussion... ordered from "most likely" to "least likely":
1) We can mimic what we've traditionally done and instead of using
ILL_*, use the Public ldi_ioctl() interface to make sockio calls to /
dev/udp and fill Private structs with returned network interface
information. While this may be alright to do in the case of
SetServerPrefs(), it would be a huge performance impact in the rx
code. When a rx UDP is received via , the call stack looks like this:
rxi_ReceivePacket->rxi_FindConnection->rxi_FindPeer-
>rxi_InitPeerParams()->rxi_FindIfMTU()->rxi_GetIFInfo()
Both rxi_FindIfMTU() and rxi_GetIFInfo() walk the ILL structs to get
interface address and MTU and from what I can tell, it does this for
*every* *received* *packet*. So, being that AFS seems rather
obsessive about staying up-to-date on a interface's MTU, it would
mean that we would be doing ioctls on a file (/dev/udp) for every rx
packet we get. This would be hellishly expensive. Would this be a
correct assumption?
2) Option 2 would be to use the above mentioned ioctl-based method,
but to remove it entirely from the critical code path. We could, at
AFSinit() time, create a worker thread which would periodically
update a global struct of interface telemetry. The worker thread
would wake up every, say, 30 seconds (tunable), lock the struct via
mutex, update it, unlock, and return to sleep. The RX and
ServerPredfs code can read their desired values from this struct when
they need it, spinning if need be.
3) This is Rob's idea, so blame him if you reel back in horror. We
find a conditional by testing for a netstack symbol in the kernel ip
module. If TRUE, we have a pointer function that points to the new
ILL_ functions with the extra argument. If FALSE, we point to the old
ones. Yum. This would certainly involve the least amount of code.
4) We toss caution to the wind and let modern routers deal with UDP
frags the way they should be and dispense with the UDP packet size
adjustments based on MTU, or at least nail them to 1500. If you're
still using AFS over a PPP connection... well... sorry 'bout that. We
also let the kernel routing table do its job and dispense with
selecting interfaces. I don't think even the NFS code jumps through
these kinds of hoops. Is there a reason we should be? I admit I'm not
too familiar with the inner details and history of things here, so
feel free to gently clue me in.
5) Continue to use the ILL method and release OpenAFS 1.4.5 with the
code being compatible with s10u4. We simply tell people that if you
want to run OpenAFS client version 1.4.5 or greater, you also need to
run Solaris KU 120012-14 (x86) or whatever the analog is if you're
running SPARC.
6) Any other idears?
/dale
--
Dale Ghent
Specialist, Storage and UNIX Systems
UMBC - Office of Information Technology
ECS 201 - x51705