Programming is probabilistic

NOTE HXA7241 2011-01-09T10:23Z

In common programming, we use abstractions in a particular kind of probabilistic way. It is normally in the background, but is fairly well-defined, and deserves more examination.

In each case of making or using an abstraction we make a trade-off. We avoid looking at the problem/solution/construction as closely as really needed, by just re-using components that are roughly right. There is an everyday significant gain in productivity, but in exchange it brings a small chance of substantial failure.

For example: we can easily add two numbers using built-in facilities, yet one day when too large a number is added it overflows and fails completely. Programming is full of this kind of construction: simple and usually fine, but in rare cases will break.

This is a particular way of mixing the needs of the programming activity with the requirements of the problem. And it seems measurable, or understandable, in quite a clear way – what more can be found out and said about it? It seems that more should be said. Making a trade-off is a normal, essential, engineering practice. But we are not doing it in a very engineering way, we are doing it a bit carelessly.

Understanding it is a matter of analysing and classifying the varieties of probabilistic assumptions, and examining more closely their details.

Software engineering is about weighing up two broad interests: how useful, effective, reliable particular abstractions are, and how much development effort is needed for that particular amount of precision and exactness of abstraction. Generally, we can be clearer and exacter about what we build – the abstractions we design – but it costs more effort. And more specifically, there are many varieties of kinds of abstraction, exactness, and effort to be considered and weighed.

We already have a vague sense of this. What we lack is a clear and extensive map and inventory of all the varieties and their features and costs.


One difference is notable: civil engineering rests largely on statistical properties of materials; software engineering rests largely on statistical properties of usage. It is not so clear cut, but there is a different emphasis in software. It partly echoes, and is related to, the dominance of representation over function (we are more concerned with software source manipulability than with its algorithmic performance).

One feature is that this means software has a large dependence on something easy to manipulate. And this leads to security problems. Dangerous but unlikely events can be deliberately induced, and so then become much more likely. The probablilites are themselves volatile. It is like later events changed the probability, and invalidated the earlier trade-off. The ease of that construction was no longer a good bargain, but the work had already been done.