[OpenAFS-devel] Re: pthreading the bosserver
Benjamin Kaduk
kaduk@MIT.EDU
Fri, 9 Aug 2013 11:32:29 -0400 (EDT)
On Fri, 9 Aug 2013, chas williams - CONTRACTOR wrote:
> On Thu, 8 Aug 2013 19:54:26 -0400 (EDT)
> Benjamin Kaduk <kaduk@mit.edu> wrote:
>
>>> For the restart cases ('restart -all' and 'restart -bosserver') the user
>>> (or bosserver itself) should probably be prevented from creating new
>>> bnodes since the bnode list shouldn't grow while the various shutdown
>>> procs are being applied.
>>>
>>> That would be easy enough to fix with a flag/lock.
>>
>> Well, I think your tempBnodes_lock is performing this function, or nearly
>> this function. I am not convinced that the extra tempBnodes queue is
>> necessary, though; I think we can just have a flag and skip the extra
>> queue and lock.
>
> You need it. You shouldn't hold the bnode list lock while running
> the proc passed into bnode_ApplyInstance() the proc's might need to
> lock the bnode list in order to change the reference counts on the
> bnodes.
You definitely can't hold the bnode list lock while running the procs in
ApplyInstance, yes. I'm not immediately convinced that dropping the bnode
list lock implies that a temp list is necessary; I think an additional
flag or write lock similar to your next paragraph may be sufficient.
Maybe, I should just try to implement my idea and we will see if it works
or not.
> However, fixing Jeffery's issue with restarts is pretty easy though. We
> just need another lock (essentially a modification/write lock) to block
> changes (add and deletes but not reference count changes) to the list of
> current bnodes while restart's are happening. Things like
> SBOZO_RestartAll() shold hold this lock to ensure that the multiple uses
> of bnode_ApplyInstance() apply to the same list of bnodes. Any racing
> bnode_CreateBnode() would eventually succeed (except in the cases were
> the bosserver is shutting down completely).
>
>> Interesting, that's not quite as I would have expected. The reference
>> count is serving only to keep the bnode from being deleted, I thought, not
>> as a write barrier. So I would not have thought the locking would add a
>> substantial overhead. And yes, "totally bogus" was a bit overblown, it
>> was mostly okay but have a few places that were quite wrong.
>
> It does but you need to take the bnode list lock in order to change the
> reference count on the bnodes. I considered a read/write lock strategy
Okay, yes.
> that would let you just have a single bnode list and avoid the temp
> queues. But that seemed overly complicated for bosserver. Now
> considering restartAll perhaps that would be a better choice.
This is more akin to my thinking, yes.
>> I'll keep looking at the windows stuff (maybe I just need to #include
>> <winsock2.h> and it will just work?) and per-bnode locks a bit more, but
>> at the moment, my leaning is that the easiest way forward would be to take
>> your code and replace the tempBnodes queue with the global flag mentioned
>> above. Would you be okay if I went and did that? This would leave us
>> with a tbozo, and we could get rid of the LWP copy after a bit of time.
>
> I have no way to test windows. I can only compile test. You can try to
I only have a 1.7 build environment, not a master one. Maybe it would
just be a matter of installing the Secure Endpoints Kerberos development
thingy that is required, I don't know.
> eliminate the temp list of bnodes if you like but I think there are
> easier ways to solve this problem. Yes, it does look complicated but
> it really isn't -- It is a temporary list of nodes.
I think the reason I am having such a hard time accepting the temp list is
that the word I use in my head to think about it is not "complicated" but
rather "ugly". :)
We'll see if that still holds in a week...
-Ben