[OpenAFS-devel] Re: Breaking callbacks on unlink

Russ Allbery rra@stanford.edu
Wed, 25 Jan 2012 11:05:42 -0800


Andrew Deason <adeason@sinenomine.net> writes:

> I do not agree that adding an option inherently implies that the project
> supports the use of it: "the use of this option is for experimentation
> and is strictly unsupported". Other projects do this; I thought Linux
> kernel devs often refused to look at any 'tainted' panic reports. (We
> already sort-of do this; some options are effectively documented as
> "don't change me".) I know that when you do something like that, people
> won't listen to it and will expect it to be supported anyway, but that
> doesn't change the fact that it's not.

> I understand the desire to not include those options, because it means
> you have to throw away bug reports, or accept them and effectively try
> to support them anyway, neither of which is good. But I mean, come on,
> for e.g. a hashtable size?

I think this is the core of the disagreement.  In the absence of resources
to fix the problem, I think adding a runtime option generally just makes
the problem worse, because now it behaves differently for different sites
and the total complexity of the source base and bug evaluation has
increased, resulting in even less resources and making it even less likely
that the problem will ever actually be fixed.

If we were in a position to actually test the various combinations of
options, that would be another matter, but adding officially unsupported
options doesn't help.  Whether you say officially that you don't support
the option or not, the conditionals in the code make the world more
complicated and open you up to a whole new set of possible bugs going
forward.

This is particularly true with AFS, which is not like most software for a
variety of reasons.  We know there are extremely subtle and complex
interactions between apparently unrelated issues, particularly around
protocol behavior such as what's discussed in this thread, that can cause
serious unexpected problems.  For example, look at the idledead problems
that have delayed 1.6.1 and that have caused serious production outages
for some sites, such as mine.  They're now being fixed properly, which is
the right thing to do; making them a configuration option would have just
meant leaving a landmine in the code and making it even harder to reason
about the logic and structure.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>