[OpenAFS-devel] Re: pthreading the bosserver
Benjamin Kaduk
kaduk@MIT.EDU
Wed, 4 Sep 2013 14:11:58 -0400 (EDT)
On Wed, 4 Sep 2013, chas williams - CONTRACTOR wrote:
> On Wed, 4 Sep 2013 12:58:22 -0400 (EDT)
> Benjamin Kaduk <kaduk@mit.edu> wrote:
>
>> On Fri, 9 Aug 2013, Benjamin Kaduk wrote:
>>
>>> On Fri, 9 Aug 2013, Benjamin Kaduk wrote:
>>>
>>>> Maybe, I should just try to implement my idea and we will see if it works
>>>> or not.
>>>
>>> A sketch of what I had in mind is up at
>>> https://github.com/kaduk/openafs/commits/chas-bozo ; it passes minimal
>>> smoke-testing on my dev box (running a ptserver that doesn't have a prdb).
>>> It does not have the extra "shutting down" flag value that jhutz mentioned,
>>> for now.
>>>
>>> At 72 insertions/75 deletions, it doesn't really have a "smaller code"
>>> endorsement going for it ... what do people think?
>>
>> I'm assembling a changeset merging the still-useful bits of my patchset
>> with Chas's heavy lifting on the pthreading front, at
>> https://github.com/kaduk/openafs/commits/newbozo . I changed the
>> commit-IDs for everything past the first commit (which was verified by
>> buildbot already), so as to not cause confusion on gerrit. (I also
>> abandoned my gerrit changes for my old patchset.)
>>
>> Once I have squahsed, reviewed and tested things, I plan to push it to
>> gerrit for general review.
>
> I wrote what I think is a general fix for the problems mentioned in
> this thread but haven't had much time to test it. It is attached. It
> tries to match Jeff's wish list as much as possible. Right now it lets
> all apply operations run at the same time on a fixed list of inodes.
> If you are in the middle of a restart, someone else can begin a restart
> as well. The side effect is that someone else can also run a status
> as well. If you wanted to serialize these, it would simply be changing
> the newBnodes_lock to a write lock in all cases.
Thanks for pulling that together, I'll look at that after I've finished
looking over the commits so far.
> I am not thrilled about your code introducing what looks like C
> primitives with BNODE_LOCK/BNODE_UNLOCK. Atleast put some ()'s on the
> end of the macros. I am not sure this is entirely correct either:
I first wrote it as a proof-of-concept, and haven't done any cleanup since
then. (Your version is probably better ;) )
> for (opr_queue_ScanSafe(&allBnodes, cursor, store )) {
> struct bnode *tb = opr_queue_Entry(cursor, struct bnode, q);
>
> bnode_Hold(tb);
> BNODE_UNLOCK;
> code = (*aproc) (tb, arock);
> bnode_Release(tb);
> BNODE_LOCK;
>
> While store holds the pointer to the next bnode, you didn't hold a
> reference to that bnode so it could potentially go away after
> BNODE_UNLOCK.
That sounds right. I have not even really convinced myself that the
queue_ScanSafe is necessary; regular queue_Scan might be sufficient.
I'll keep that in mind when I am at the polishing up stage.
Thanks again,
Ben