[OpenAFS-devel] warnings fix
Jeffrey Altman
jaltman@secure-endpoints.com
Wed, 15 Jul 2009 12:01:06 -0400
Marcus Watts wrote:
> Jeffrey Altman <jaltman@secure-endpoints.com> writes:
> ...
>> None of these "interesting surprises" are specific to rxk5. Maintaining
>> them privately and mixing them into the monolithic rxk5 branch makes it
>> nearly impossible for them to be incorporated into the tree.
>> For each one of these items an RT ticket should have been created
>> at the time the issue was discovered.
>
> Please shoot the messenger more.
>
> I had originally planned to do more to split this patch out. I gave up
> when it was clear substantial portions of the patch had already become
> obselete, and it was not clear if *any* particular item addressed by the
> patch had value.
My comments are not about "shooting you" they are guidance on how
not to waste your time. How much time have you spent sorting through
your patches trying to pull out these issues? If you find a problem
like this, instead of fixing it in your tree and moving on, send a
bug report at the time. Let the rest of the community know the issue
is there and perhaps someone else will fix it for you. If not, you
send the patch when you have it. In either case, the problem will
addressed and in the meantime it will be documented that the issue
needs to be worked on and the likelihood of two independent developers
doing the same thing are reduced.
>>> /2/
>>> src/auth/cellconfig.c
>>> afsconf_GetCellInfo
>>> calls lcstring giving it the value acellName -- which means
>>> this function can scribble over (lower-case) the input cell name
>>> (a "surprise" side-effect).
>> I do not believe this is a bug. When I reviewed this issue several
>> years ago I concluded the side-effect was intended because all cell
>> names are case-insensitive and the normative representation is in
>> lower case.
>
> This started off with noticing that cellname being a "const" in some
> functions. Following that lead to this function.
> [ In fact, I dimly recall an unpleasant bug somebody had many
> years back with perhaps just this feature (passing in a kerberos
> 4 realm name...) It was still transarc code at the time, I think... ]
> In any case, the input parameter change is not necessary, is not used
> anywhere inside openafs, and I think not useful or desirable. If there
> were code that wanted this behavior, it could always call lcstring separately.
> Now that I think about it, this will certainly cause pr_Initialize()
> in turn to behave differently depending on the string fed into it.
> I look forward to seeing documentation on that behavior.
>
> Really, whether this counts as a bug or a feature depends on how it's documented.
Searching through the OpenAFS sources one comes across the
following comment:
/*
* We must copy the cellname because afsconf_GetCellInfo
* actually writes over the cell name it is passed.
*/
>>> /3/
>>> src/uss/uss_vol.o
>>> used vldbentry not nvldbentry.
>>> On little-endian machines, MapHostToNetwork won't see the right
>>> server count, and may do weird things.
>> There are many locations in the source tree where VL_GetEntryByID is
>> used instead of attempting to use the newer variants first.
>>
>> src/bucoord/volstub.c
>> src/uss/uss_vol.c
>> src/vlserver/vlclient.c
>>
>> RT # 125101
>
> If I remember right, VL_GetEntryByID returns a vldbentry,
> MapHostToNetwork takes a nvldbentry. The data offsets, starting
> with the server count, are different, so MapHostToNetwork is going
> to do something "funny". Well, maybe -- depends on machine
> architeture & input data. Past that -- I think I looked at
> other MapHostToNetwork uses and decided this was the only case
> that looked suspect.
I opened a ticket and did not submit a patch because there
are other reasons why using the N versions of the RPCs are
desirable and it was unclear from your previous description
what exactly the scope of the issue was.
> Is a site that lacks the db server support for the newer
> variants of VL_* actually likely to use the 1.5.x/cvs head
> versions of the uss code?
I doubt there are any such servers left given the ubik
time bug from several years ago.
>>> /10/
>>> src/volser/vos.c
>>> ListAddrs,
>>> For multihomed addresses, was calling VL_GetAddrsU again
>>> passing in &m_uuid, but this is a afsUUID *m_uuid, so this
>>> is actually trashing some part of the stack.
>> I am unable to find a location in ListAddrs() where m_uuid
>> is declared as "afsUUID *". Please double check this.
>> If you do find an issue, please submit a bug to RT or
>> a patch to gerrit.
>
> What you are seeing is the result of my trying to be efficient and
> comprehensive with very limited time.
> ...
> I'm sorry I wasn't more precise about this.
http://gerrit.openafs.org/101
> ...
>>> /12/
>>> src/viced/viced.c
>>> call to rx_NewServiceHost
>>> has nice neatly labelled arguments - scrambled by some
>>> automatic code formatter.
>> while this is a readability issue it is not a bug.
>> I'm not going to change this at the present time just for
>> the sake of reformatting the code.
>
> This one showed up here because this particular piece of code
> *has* to change for rxk5, also because it's a particularly
> glaring example of "automatic code formatter gone wrong".
> This was, after all, the "interesting surprises" section,
> not the "bugs" list.
>
> So, are you actually asking me to keep "beauty" changes as
> a monolithic whole with rxk5 changes?
Actually I would argue that you should leave the formatting
issues alone until a more global beautification is performed.
> Again, see above. "comprehensive" is surprisingly the least time
> consuming part. "detailed" and "accurate" makes it a lot more time.
Apparently full detail and accuracy is not required.
> "And we're going to commit patches that address some of your concerns
> meanwhile" means the solution didn't converge anymore.
OpenAFS is a huge project with 42 active developers (if you trust
ohloh.net). As the code quality improves the number of developers
making modifications to it increases. As Russ and Simon and others
have pointed out, attempts to maintain local changes against a fast
moving target are bound to fail. You must strive to isolate you
changes and get as much of them out of your private tree as fast
as possible. The longer you wait, the more likely it is that someone
else is going to duplicate your effort and beat you to the commit.
> The real question
> here was "was there anything useful I found that you didn't find?"
> If the answer is yes, arguing I not follow the path that got the useful
> stuff is probably not ideal.
The issue is not whether or not you found something previously unknown.
The question is how much effort got wasted in the process of getting to
this point. You are welcome to waste as much of your time as you would
like but at the end of the road:
* Don't blame the gatekeepers because you wasted it.
* Don't blame the gatekeepers for failing to incorporate your
code changes that were never delivered in a manner that
makes it feasible to commit.
* Don't blame the gatekeepers or anyone else in the community
for failing to review your protocol design and implementation
for a huge project when it is is presented in a monolithic
form filled with irrelevant code changes.
* Don't get upset when someone else in the community performs
roughly similar work and gets their effort committed before
you do.
We appreciate the fact that you brought these issues
to the attention of the community. I'm just sorry that you wasted
so much effort in the process of doing so.
On a separate not, I take great offense at your comment about "shaming"
the gatekeepers into doing something. Take a look at the analysis of
who actually has done something to improve OpenAFS
https://www.ohloh.net/p/openafs/contributors
What you will notice is that it is the gatekeepers that do the vast
majority of the work. That list is measuring authorship not commits.
In terms of cleanup work, less than 30% of the code base is still
the original IBM contribution. I leave it as an exercise to determine
using git who has performed most of that work.
Jeffrey Altman