Testing: thoughts on improving it

NOTE HXA7241 2011-01-30T09:17Z

What does testing really do? And how can it be improved? Thinking beyond the surface seems to lead away from testing and toward automation and ergonomics.

What testing does not do, and what should be doing that instead

‘Testing’ is somewhat of a misnomer. It does not really test the functionality: only the final usage can do that. Because for testing to be truly testing requires an independent objective reference. Do we have that when we write tests?

If you have information that is better than the information you used to design the code, why did you not derive the code from that information? And if the information you have is worse than that used to make the code, what is the point of testing against it?

You should get the best information and derive the code faithfully from that. Then the only testing to be done is against the human judgement of the user.

So the way to improve correctness is not by writing more and better tests. It is by making more and better ways of deriving code from available information. (And making the functionality more clearly and readily available to human judgement, but that is veering toward another topic . . .)

What testing does do, and how to improve that

Really, testing does two things: 1, helps maintain consistency across code modifications; 2, helps find random errors by producing a redundant description.

Both seem improvable by focusing on their real actions.

Consistency

This is indeed recognised as a main purpose of testing: you can refactor, or non-destructively modify the code, and the tests will show if it still works. But what testing is doing here is not really showing correctness. It is showing any break in consistency.

There is something important here. Since it is not testing correctness, that is, not depending on human judgement, and only comparing change, it can ideally be completely automated. We do not need to write ‘tests’ for this at all, it could all be generated.

We want something like an automated random comparison, to:

Find all matching interfaces of components/parts.
Generate random stimulus for them.
Compare responses before and after the modification.

This can be more comprehensive and thorough than manual tests, and can be done automatically.

(Ultimately this is another case where testing is not the real answer: we should simply only allow consistency-maintaining transforms between code versions. But the possibilities and practical limits of that is another subject . . .)

Unanimity

Since the information used to write tests is (or should be) no better than that used to write the code, a test merely provides an alternative expression. Neither is more authoritative, both are equivalent. Testing is not really showing correctness here either: it compares two alternatives and checks their unanimity.

This is essentially like error-detecting/correcting codes for unreliable hardware. Software development starts with (relatively) good information, but then it passes through a noisy processing – humans writing software. Neither content nor error-detection/correction code is more certain, but together they can probably say if an error has happened.

This purpose should be more deliberately and fully addressed. We can have an alternative expression designed specially to work as error-correction for the particular kinds of human error in programming.

We want an alternative expression to be:

A single special language – to learn beside all other languages
Easy to write – easier than the coding language.
Different in ways that catch common and important errors.

This should be more effective than normal testing, and less effort to produce.

(It is even possible to see how to nearly automate this, by re-arranging/reversing the human activity. Instead of humans producing a different expression, have a different expression automatically generated, and have the humans judge if it matches their requirements – a passive form of judgement. But practically, this seems weaker: it is too easy to review without paying attention, but demanding active expression of code forces the judgement to be made.)

Conclusion

We do not really want ‘testing’. We want other things that address the truer concerns: better software transforms (and better visibility of functionality), better tracing and comparing of modifications, and better error-correction of human coding. All seem very practically realisable and improvable.