[OpenAFS-devel] Re: pthreading the bosserver

Benjamin Kaduk kaduk@MIT.EDU
Fri, 9 Aug 2013 11:32:29 -0400 (EDT)


On Fri, 9 Aug 2013, chas williams - CONTRACTOR wrote:

> On Thu, 8 Aug 2013 19:54:26 -0400 (EDT)
> Benjamin Kaduk <kaduk@mit.edu> wrote:
>
>>> For the restart cases ('restart -all' and 'restart -bosserver') the user
>>> (or bosserver itself) should probably be prevented from creating new
>>> bnodes since the bnode list shouldn't grow while the various shutdown
>>> procs are being applied.
>>>
>>> That would be easy enough to fix with a flag/lock.
>>
>> Well, I think your tempBnodes_lock is performing this function, or nearly
>> this function.  I am not convinced that the extra tempBnodes queue is
>> necessary, though; I think we can just have a flag and skip the extra
>> queue and lock.
>
> You need it.  You shouldn't hold the bnode list lock while running
> the proc passed into bnode_ApplyInstance() the proc's might need to
> lock the bnode list in order to change the reference counts on the
> bnodes.

You definitely can't hold the bnode list lock while running the procs in 
ApplyInstance, yes.  I'm not immediately convinced that dropping the bnode 
list lock implies that a temp list is necessary; I think an additional 
flag or write lock similar to your next paragraph may be sufficient.
Maybe, I should just try to implement my idea and we will see if it works 
or not.

> However, fixing Jeffery's issue with restarts is pretty easy though.  We
> just need another lock (essentially a modification/write lock) to block
> changes (add and deletes but not reference count changes) to the list of
> current bnodes while restart's are happening. Things like
> SBOZO_RestartAll() shold hold this lock to ensure that the multiple uses
> of bnode_ApplyInstance() apply to the same list of bnodes. Any racing
> bnode_CreateBnode() would eventually succeed (except in the cases were
> the bosserver is shutting down completely).
>
>> Interesting, that's not quite as I would have expected.  The reference
>> count is serving only to keep the bnode from being deleted, I thought, not
>> as a write barrier.  So I would not have thought the locking would add a
>> substantial overhead.  And yes, "totally bogus" was a bit overblown, it
>> was mostly okay but have a few places that were quite wrong.
>
> It does but you need to take the bnode list lock in order to change the
> reference count on the bnodes.  I considered a read/write lock strategy

Okay, yes.

> that would let you just have a single bnode list and avoid the temp
> queues.  But that seemed overly complicated for bosserver.  Now
> considering restartAll perhaps that would be a better choice.

This is more akin to my thinking, yes.

>> I'll keep looking at the windows stuff (maybe I just need to #include
>> <winsock2.h> and it will just work?) and per-bnode locks a bit more, but
>> at the moment, my leaning is that the easiest way forward would be to take
>> your code and replace the tempBnodes queue with the global flag mentioned
>> above.  Would you be okay if I went and did that?  This would leave us
>> with a tbozo, and we could get rid of the LWP copy after a bit of time.
>
> I have no way to test windows.  I can only compile test. You can try to

I only have a 1.7 build environment, not a master one.  Maybe it would 
just be a matter of installing the Secure Endpoints Kerberos development 
thingy that is required, I don't know.

> eliminate the temp list of bnodes if you like but I think there are
> easier ways to solve this problem.  Yes, it does look complicated but
> it really isn't -- It is a temporary list of nodes.

I think the reason I am having such a hard time accepting the temp list is 
that the word I use in my head to think about it is not "complicated" but 
rather "ugly". :)
We'll see if that still holds in a week...

-Ben