[OpenAFS] buildbot and packages

Fri, 14 Sep 2012 14:36:00 -0700

Troy Benjegerdes <hozer@hozed.org> writes:

> All this talk about 'reliable code for our users' is total BS until
> 'make check' actually does some realisitic functionality tests.

Speaking as someone who has worked on adding a test framework to OpenAFS
and who is obsessive about test-driven development for his own code, I
agree with you.  However, again, you have to be sensitive to how much work
this is.

We unfortunately have a code base right now that is almost devoid of
testing.  Even putting aside the importance of integration and performance
testing for AFS and focusing just on unit testing, which isn't as useful
but which is much, much easier (and which *would* be useful for a lot of
components; for example, I'd love to see a thorough unit test for the
demand-attach state model), retrofitting this sort of test suite onto
existing software is hard.

This is particularly true given that we have a code base in C, not Java or
another language with strong testing support.  That means we don't have
tools like Mockito, or patterns like dependency injection and inversion of
control, available to us to make testing easier.  In C, you effectively
have to write all that stuff yourself, including mock components so that
you can write effective tests without requiring a vast infrastructure be
in place for any test to be effective.

To give you an example, I maintain a couple of PAM modules, and I decided
that I was sick and tired of doing development without a net and
retrofitted test suites onto both of them.  I now have an infrastructure
that I'm relatively happy with, one package entirely converted, and one
package partly converted, and I have test coverage (of the one that's
entirely converted) of about 60% of the option space.  That effort took
about 40 hours of dedicated programming effort solely on the test suite.

I've been writing C for 20 years; I'm not particularly slow at it.  But
among the things that I had to write:

* A mock PAM library so that I could feed configuration into my modules
  and inspect state to do proper testing.

* Infrastructure to allow user configuration of test accounts and
  principals in a Kerberos realm that could be used for testing for the
  Kerberos-related components.

* A scripting language for writing tests, combined with the parser and
  implementation of that language, so that tests could be data-driven.

This is *typical* of the kind of infrastructure that has to be developed
to retrofit test suites to large-scale C projects.

Furthermore, one of the reasons why test-driven development is so powerful
is that test-driven development changes how you write code.  You write
self-contained modules with clear interfaces because otherwise they're
untestable.  You insert annotation points and code injection points so
that you can instrument code and inspect state and ensure that it is
behaving properly.  You have to be much more careful about how
configuration and implicit system state knowledge is handled so that you
can configure things in a test mode.  All that makes your code much
better.

But the flip-side of that is that a huge code base that *wasn't* written
using test-driven development techniques doesn't do any of those things.
The code is frequently almost untestable.  There are no clear functional
boundaries where you can inject test machinery.  There is no little or
isolation to make independently-testable modules.  The configuration
support is frequently insufficient to properly configure the code to run
in a testing mode.  You have to do *huge* refactorings of the code in
order to add all of that.

And, from the perspective of an external client, all that work is
overhead.  It doesn't add any new features, it rarely fixes any
significant bugs while you're doing it, and it takes a ton of time.  No
one, and I mean no one, pays you to do that sort of work.  I was able to
do this for my PAM modules because Stanford gives me huge amounts of
discretion in how I do my job and is used to me periodically taking some
time away from "real work" to do these weird things that only Russ cares
about.  And even there, there's a limit; those 40 hours of development had
to be spread across two calendar years, where I could steal some time here
and some time there.

So yes, preach the testing religion, brother!  But you simply have to be
realistic about the prospects of comprehensive testing of a huge existing
code base that was never designed to be testable.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>