[OpenAFS-devel] pthreading the bosserver

Thu, 8 Aug 2013 15:40:08 -0400 (EDT)

Hi all,

It looks like the last hurdle before a core rxgk spec gets through 
afs3-stds is the description of the GSS negotiation loop.  The right 
answer there is well-known, though describing it is easy, so it should 
just be a matter of time before it's done.

With just the core rxgk spec done, we can still start adding rxgk to the 
non-AFS3 protocol pieces in our tree, and I think the bosserver is the 
most promising target.  This allows us to get some code in the tree for 
the security class and get some working experience, and reduces the burden 
for developing additional code out-of-tree.

However, the bosserver is currently using LWP for parallelism, and GSSAPI 
libraries which are compatible with LWP are hard to come by; the obvious 
solution is to convert the bosserver to pthreads.  Chas Williams came up 
with an implementation about 8 months ago, which is currently languishing 
in gerrit (it just got a newly rebased version, the tip commit is gerrit 
number 8794).  I had a mostly complete independent implementation before I 
learned about his work, unfortunately; mine is in gerrit with the tip 
change 10130.  Mine is still incomplete in that it needs some build 
massaging on a few Unix platforms, and needs a bit of Windows attention. 
As tempting as it may be, I seem to recall that we do not have critical 
mass to drop server support for windows on master, so that would need to 
be fixed for my code to move forward.

There are a few general differences between the approaches in the two 
patchsets, and I was hoping we could have an architectural discussion on 
this list.

First off: do we need to keep an LWP version of the bosserver around as 
well as a pthreaded one?  I don't think so, and I believe Simon agrees, 
but it would be good to get consensus.

Second, how strong of an integrity guarantee do we need for the bos 
config?  My understanding is that configuration changes (adding or 
removing or en/disabling bnodes) are rare events, and it is highly 
unlikely that multiple administrator connnections changing things will be 
made concurrently.  If this is true, then we can rely on time-domain 
"locking" for synchronization and eliminate some aspects of code-level 
locking.  For example, a per-bnode lock acquired before writing any bnode 
state would not be needed, and a single global lock would be sufficient.

Relatedly, is it okay to assume that shutdown/restart/etc. will not be 
issued concurrently with config changes?  A "fully correct" implementation 
would seem to need to only shutdown/restart the bnodes which were 
configured when the command was issued, and ignore any new nodes created 
since then.  Because the implementation of shutdown/restart must drop 
locks, making this guarantee seems to require additional sychronization 
effort, whether via a temporary queue to store the bnodes being acted 
upon, or a higher-level lock.

Then there's the question of signal handling.  The discussion on 
gerrit/6947 raises some potentially large spectres, in particular 
LinuxThreads compatibility.  Chas has dedicated pthreads for each child 
process to listen for SIGCHLD, plus a global thread for SIGTERM/SIGQUIT 
(hmm, SIGKILL is added to the set, too, but can't actually be blocked or 
masked); my version just has a single signal handler routine that uses the 
sigpipe trick to wake up the bproc thread and check the children.  This 
sigpipe trick doesn't work directly on Windows; I'll need to look more 
carefully at how to workaround.  I seem to recall that we hand-roll signal 
compatibility bits for Windows anyway, so it would stay in-tree.  I 
haven't been able to convince myself that the additional complexity of the 
extra watcher threads is necessary, but if someone else could convince me, 
that would be good.

Off the top of my head, those are the main structural differences between 
the two existing implementations, I'd be interested to hear everyone's 
opinions on the questions.  I'm not tied to my code, but I did continue 
with it after learning about Chas's work because I did have some questions 
about these architectural questionss.  If the consensus is that his stuff 
is fine, we should go with that -- my main goal is just to get a pthreaded 
bosserver in the tree so that we can build off of it with rxgk.

Thanks,

Ben