[OpenAFS-devel] Re: pthreading the bosserver

Benjamin Kaduk kaduk@MIT.EDU
Wed, 4 Sep 2013 14:11:58 -0400 (EDT)


On Wed, 4 Sep 2013, chas williams - CONTRACTOR wrote:

> On Wed, 4 Sep 2013 12:58:22 -0400 (EDT)
> Benjamin Kaduk <kaduk@mit.edu> wrote:
>
>> On Fri, 9 Aug 2013, Benjamin Kaduk wrote:
>>
>>> On Fri, 9 Aug 2013, Benjamin Kaduk wrote:
>>>
>>>> Maybe, I should just try to implement my idea and we will see if it works
>>>> or not.
>>>
>>> A sketch of what I had in mind is up at
>>> https://github.com/kaduk/openafs/commits/chas-bozo ; it passes minimal
>>> smoke-testing on my dev box (running a ptserver that doesn't have a prdb).
>>> It does not have the extra "shutting down" flag value that jhutz mentioned,
>>> for now.
>>>
>>> At 72 insertions/75 deletions, it doesn't really have a "smaller code"
>>> endorsement going for it ... what do people think?
>>
>> I'm assembling a changeset merging the still-useful bits of my patchset
>> with Chas's heavy lifting on the pthreading front, at
>> https://github.com/kaduk/openafs/commits/newbozo .  I changed the
>> commit-IDs for everything past the first commit (which was verified by
>> buildbot already), so as to not cause confusion on gerrit.  (I also
>> abandoned my gerrit changes for my old patchset.)
>>
>> Once I have squahsed, reviewed and tested things, I plan to push it to
>> gerrit for general review.
>
> I wrote what I think is a general fix for the problems mentioned in
> this thread but haven't had much time to test it.  It is attached.  It
> tries to match Jeff's wish list as much as possible.  Right now it lets
> all apply operations run at the same time on a fixed list of inodes.
> If you are in the middle of a restart, someone else can begin a restart
> as well.  The side effect is that someone else can also run a status
> as well.  If you wanted to serialize these, it would simply be changing
> the newBnodes_lock to a write lock in all cases.

Thanks for pulling that together, I'll look at that after I've finished 
looking over the commits so far.

> I am not thrilled about your code introducing what looks like C
> primitives with BNODE_LOCK/BNODE_UNLOCK.  Atleast put some ()'s on the
> end of the macros.  I am not sure this is entirely correct either:

I first wrote it as a proof-of-concept, and haven't done any cleanup since 
then.  (Your version is probably better ;) )

>    for (opr_queue_ScanSafe(&allBnodes, cursor, store )) {
>        struct bnode *tb = opr_queue_Entry(cursor, struct bnode, q);
>
>        bnode_Hold(tb);
>        BNODE_UNLOCK;
>        code = (*aproc) (tb, arock);
>        bnode_Release(tb);
>        BNODE_LOCK;
>
> While store holds the pointer to the next bnode, you didn't hold a
> reference to that bnode so it could potentially go away after
> BNODE_UNLOCK.

That sounds right.  I have not even really convinced myself that the 
queue_ScanSafe is necessary; regular queue_Scan might be sufficient.
I'll keep that in mind when I am at the polishing up stage.

Thanks again,

Ben