I hope this will provoke some new ideas about how to develop secure systems software.
Thesis: The Software Development Process Itself is THE Major Security Vulnerability in Current Computing and Communications Practices This is because feasibility of the process depends entirely on re-use of concrete implementations of algorithms, and so software developers must routinely download and run code, because it is absolutely necessary if they are to do their work. The quantity of code that they download and run is far, far greater than they could possibly inspect, even cursorily. Therefore software developers are constantly running unknown code. Any such code they run as part of their work typically has both read and write access to all the other code they work on, because they typically use the same machine user account for all their work. On Unix systems, this is practically mandated by the fact that account setups are not easy to duplicate, and switching from one account to another is not easy, because it is almost impossible to transfer the 'desktop' setups and running applications from one user account to another, without stopping and restarting them. Of course, if this switching of running programs from one account to another were made automatic, then there would be no point in switching accounts when working on different projects. The security situation is further aggravated by the fact that software developers have a "trusted status" amongst the computer-using community, on account of their specialist skills. Because of this perception, there is a greater tendency to _actually trust_ the words and deeds of software developers. Thus the general public are on the whole _far less_ likely to question the actions, intentional or unintentional, of software developers. This tendency is inversely proportional to the perceived status of the software developer, which is in turn proportional to the extent to which the results of their work are distributed. So this is contrary (doubly so) to the healthier purely rational attitude, which would be to treat the actions of _all_ software developers with the utmost suspicion, and the better known they are, the more so! Since the products of software developers are frequently duplicated _en masse_ and then widely distributed, the developers own vulnerability is yet further exaggerated when it extends transitively to the consumers of their software. Problem: Clearly then, the fewer software developers there are, the better, and also, the the less those few developers do, the better. But many people seem to _enjoy_ developing software. Should they be deprived of that right? The first question is whether it is really software development itself that they find enjoyable, and not just some particular aspects of it which are not so much evident in any other activity? The fact that software developers often have other interests in common indicates that this is probably not so. The second question is related to the first. We must ask _why_ they enjoy software development. If software development were practised differently, would they still enjoy it? For example, if software developers had to pay constant attention to issues of security, in order to ameliorate the vulnerability we have identified, would they find it equally enjoyable? Every indication is that most would not. Software projects which have high-security criteria are not as popular with developers as others which are typically more relaxed, and where developers perceive they have far greater freedom, and therefore many more opportunities to be creative. This accords with our tentative answer to the first question, because the interests that software developers have most in common seem to do with creative use of imagination in areas where people can exercise intelligence. So if we could only find a way to re-employ all the people who enjoy "coding," in an area where they have far more freedom and opportunity to express creative intelligence, then everyone will be better off, except those people who rely on the products of software developers, because they need software in order to be able to express _their_ creative intelligence! Therefore we need at the same time to find a way to produce software without using human "coders". Solution: It is widely held that "human coders" are absolutely necessary, because machines simply aren't intelligent enough to write programs. This is probably true: machines cannot be programmed to be creative, because creativity is, almost by definition, NOT mechanical. But this is an error, and that it is so is clear as soon as one recalls that the difficulty we identified, that people have with producing _secure_ software, is that there are _fewer_ opportunities to creatively express intelligence: production of secure software requires a formal discipline, and the results must be in some sense more reproducible and predictable than those of imaginative and creative expression of intelligence would be. Therefore it is very likely that machines _could_ in fact be used to carry out predictable, disciplined coding, controlled by formal rules which could be shown to result, when followed very strictly, in the production of secure, reliable software. Some may object that this is all very well for communications protocols and device drivers, but how could software produced by rigidly following formal rules be used to express, say, artistic creativity? For example, what sort of a computer game could be produced by a program following formal rules? What sort of music could be produced by a computer following formal rules which determined the notes that were played. This is again an error, one of confusing the medium with the message: just because the rules are rigid, doesn't mean the results are not creative expression. One need only listen to the music of J.S.Bach, or read a Shakespeare sonnet to see this. Of course the same applies to the process of generating programs. Software written by a formal process following carefully constructed rules, as well as being more correct, is in fact far more beautiful than that constructed _ad hoc._ This is because there is far more opportunity to change the expression of a formally specified process: one simply changes the specification of the generating process itself, and then re-generates the program. And because the process is controlled by a computation, the result is typically far more elaborate than any human mind could conceive. The evidence for this is in the quite spectacular results which have been achieved using mechanical processes to generate Fast Fourier Transform implementations. The process used by the FFTW program's elaboration of formal specifications has produced several elegant optimizations of the basic algorithm for special cases, which cases had not hitherto been published in the technical literature. The same process also produced some of the published algorithms, which until then had very probably been considered to be the results of an essential intelligence. But the intelligence is not in the formal system, it is being expressed _through_ the _use_ of the system, not _by_ the system itself. This is probably not very different to the intelligent Human Being's use of the formal system of mathematics to express intelligent thought. In highly specialised mathematical fields, people use well-understood _methods_ to _derive_ results. For example, in [1] which is available from: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html, Edsger Dijkstra mentions a method of deriving concurrent algorithms with correctness proofs. It is disarmingly simple: "And then something profound and lovely happened. By analyzing by what structure of argument the proof obligations could be met, the numerical mathematician Th.J. Dekker designed, within a few hours, the above solution, together with its correctness argument, and this settled the contest. [...] the pair "c1,c2" implements the mutual exclusion, while "turn" has been introduced to resolve the tie when the two processes simultaneously try to enter their critical sections." Similarly, a mathematician working on an optimization for a special case of the FFT algorithm, might use a combination of analysis and synthesis as a more or less formal process by which she can derive certain classes of solution, perhaps also with correctness proofs. There are in fact numerous examples of this in the computing literature, although it is seldom if ever stated that this is a recognised technique that is being used. A very obvious example is the Hindley-Milner type inference algorithm as described by Damas and Milner in [2]. Here the soundness and completeness are described in terms of a set of formal rules of deduction, and the structure of the algorithm W, which infers the type of a given expressions, very closely follows the available rules in the inference systems for type judgements and semantic truth. Another good example is John Harrison's elegant treatment of exact real arithmetic in [3]. Here a series of proofs of the required properties of the arithmetic operations on real numbers expressed to any given precision are used to produce the algorithms which are written as ML functions to compute those values. The system resulting from this specification is therefore automatically, because _by definition,_ formally correct. I am sure numerous other examples could be found. Often all that is necessary is a subtle shift of one's point of view, to see the process of deriving the algorithm as a proof process, or vice-versa, depending on the point of view from which the author writes the description. Whatever that is, one gets the unifying process by adding the opposite, unstated aspect: the proof or the algorithm. Changing the process of generation of a program is impossible to do when it is written ad-hoc, unless it is so uniform and regular that it could in fact have been produced by a formal mechanical process, because then one can simply treat that program as a formal specification. But unless it has a highly uniform, regular pattern, which is not so elaborate that its abstract form is imperceptible to a human being (*), an ad-hoc program cannot be restructured automatically. Ask anyone who has applied a large number of patches, from different sources, to an operating system kernel, say, and they will tell you this is indeed so. And to see the truth of the clause (*), try to abstract the FFTW generating algorithm form from a visual inspection of the code it produces in different cases. One should not think of this process of proving correctness as a highly abstract one. It is no more or less abstract than the algorithm itself, because in this view, the two things, the proof and the algorithm, ARE THE SAME THING, seen from two different points of view. This is a common experience, because people only feel they actually understand an algorithm when the can SEE IMMEDIATELY WHY IT REALLY DOES WORK. This is why NOBODY really understands public-key cryptography algorithms: because no-one actually knows WHY they are secure. Their security, such as it is, rests not on knowledge, but solely on the ABSENCE of knowledge that they are NOT secure. So they provide only a certificate, not actual knowledge. That's what ALL certificates are for: they are used ONLY when actual knowledge is not strictly necessary, because it is expected to be available at some subsequent stage of a process. The certificate is therefore used solely for the purpose of increasing efficiency in cases where it is more likely than not that, when it becomes available, the actual knowledge will concur with the certificate representing it. There is a quite well-known problem on which the reader may enjoy trying out this technique. One is set the task of determining which of a certain number of otherwise indistinguishable objects (coins or footballers, in the two instances we have heard) is either lighter or heavier than the all others, which are the same weight, and whether that object actually is lighter or heavier. The difficulty is that the only measuring device one has available is a balance with fixed-length arms, and in each version of the problem there is a certain maximum number of weighings that one is allowed. The task is to describe an _infallible method_ of determining the _correct_ answer. In some cases it is obvious, and in others it is clearly impossible. For example, if one is allowed three weighings to distinguish between one of 64 possibilities. Then the absolute maximum possible outcomes of three weighings are 3*3*3=9 in number, so one cannot hope to distinguish the one correct answer from 64 equally probable ones. In some of these problems the solution is not immediately obvious, and so it seems unlikely that it is possible, but not because of any absolute bound on the information that the result of the given number of measurements could contain. Of course these are the more interesting cases, and one can use the algorithmic method of analysis and synthesis Dijkstra mentions to solve them, or indeed to determine that there is no possible solution. One needs only to consider how one would convince _someone else_ that the result the method determines in every possible case is correct; and then analyse the forms of those arguments, and this will _automatically_ yield the algorithm, together with its correctness proof. Of course, when the number of available measurements is critical, as it is in the interesting cases, it is important to make only those measurements which TOGETHER yield the most information overall. So "maximizing the entropy," as it were (choosing those measurements which discriminate most between the possible outcomes of the WHOLE series of measurements) is the guiding principle which determines the algorithm that this method generates. The process as we have just described it is typical of the way inference problems are tackled in Artifical Intelligence applications such as are beautifully described by E.T Jaynes in his book "Probability". So we hope that we have allayed any fears the reader may have had that the process of deriving correctness proofs is highly abstract, and involves a great deal of incomprehensible logical and/or mathematical symbols, and persuaded her instead that it is really an intuitive process, and in a strong sense, one that is IDENTICAL with the actual understanding of the algorithm with which it deals. We can now go on to consider what Dijkstra's method has to do with coding secure software. The problem with all security, is what one might call "the leaking abstraction." This is always a result of some fixed concrete representation of data as being encoded by _information_ of some kind. And the leaking abstraction occurs when that encoding is redundant, so that there are many different possible encodings of the same data. Simple examples are: (1) a forged bank-note, where there are two or more representations of the same one note. Each note carries a unique identifying number, which is its certificate of authenticity, and the difficulty of manufacturing a convincing replica means that the certificate is useful, because it is more likely than not that 'at the end of the day' the real note will be matched with the actual object of value it represents. Some fraction of some particular gold bar, for example. (2) a person in disguise, masquerading as another person who has some unique privilege; for example, the right of entry to some secure facility, such as a military compound. Here there is redundancy introduced into the representation of the individuals, so that, as in the case of the forged bank-note, the successful infiltrator compromises the ability of the experimental method to infallibly discriminate between those who have this privilege and others who do not. (3) 'Picking' the five-lever lock on a door. Here the redundancy is in the representation of the 'certificate' which signifies authenticity of the tool used to open the door. It is because the lock-picker's tools are another representation of the genuine key, and the method the levers implement is not sufficient to discriminate between the possible outcomes of genuine and false 'certificates.' Of course the same principle applies to an illicit copy of the key. (4) 'Tempest' decoding of the electromagnetic emissions from an LCD connector on a flat-panel computer display. This is possible because of the repeated scanning of similar images successively displayed on the screen. This causes multiple, repeated transmissions of similar signals which allow an 'attacker' to deconvolve the signal and recover the original modulating pattern, which is the image being displayed on the screen. If the computer only transmitted the changes to individual pixels when they changed, this deconvolution would not be possible in general. Hopefully the reader will easily be able to produce many, many more examples of what are all essentially different forms of the same fundamental phenomenon. Abstractions which are realized in fixed concrete representations invariably leak, simply because they introduce redundancy into the representation of what are supposed (or assumed) to be unique events. So if we could only find a method of making abstractions in such a way that their concrete representations were always changing, then we would have solved the problem of computer and communications security in general, because there would be no fixed 'certificate of identity' for the 'attacker' to forge. And dually, there would be no redundancy for the 'attacker' to 'key into' in the information which represents the data that system processes. In [4], Dijkstra presents a method of designing 'operating systems' in which layers of abstraction are successively introduced, one on top of the other, and in which some concrete representation of the physical hardware of the machine 'disappears' as each new abstraction is added. The disappearing physical objects are replaced by multiple virtual representations of objects with the same or similar functionality. I.e., forgeries of the real identifiable hardware. The paper does not give detailed examples of how this is done, but it seems to us more than likely that using Dijkstra's other idea of designing algorithms by a process of analysing what would be required of a proof of correctness, and taking the view of this analysis as being guided by a principle of maximizing the information content (i.e. minimising the redundancy) of successive measurements, one could formulate a method for deriving a general class of system designs of the kind Dijkstra describes. One would start by considering what it is one would need to do to convince someone else that a system is secure (i.e. that the data are represented concretely with minimal redundancy, or equivalently, maximum entropy.) For example, that the data stored in a disk or core memory was 'mixed up' as much as possible. This is achieved in Dijkstra's example system by virtualising the physical pages and storing addresses as 'segments' which are mapped by a virtual process onto physical pages of memory. So it is only via the virtual page-mapping process that the segment identifiers are meaningful. But the virtual page-mapping process is not visible to any other process except via the highly restrictive synchronisation primitives via which processes are constrained to communicate. In [1] Dijkstra mentions: "In the beginning I knew that programming posed a much greater intellectual challenge that generally understood of acknowledged, but my vision of the more detailed structure of that vision was still vague. Halfway [through] the multiprogramming project in Eindhoven I realized that we would have been in grave difficulties had we not seen in time the possibility of definitely unintended deadlocks. From that experience we concluded that a successful systems designer should recognize as early as possible situations in which some theory was needed. In our case we needed enough theory about deadlocks and their prevention to develop what became known as "the banker's algorithm", without which the multiprogramming system would only have been possible in a very crude form. By the end of the period I knew that the design of sophisticated digital systems was the perfect field of activity for the Mathematical Engineer." [1] Edsger W. Dijkstra. EWD1303 https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html [2] Luis Damas and Robin Milner, Principal Type-schemes for Functional Programs, In: Proceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages ACM, 1982, 202--212. http://prooftoys.org/ian-grant/hm/milner-damas.pdf [3] John Harrison, Introduction to Functional Programming. http://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/all.pdf [4] Edsger W. Dijkstra "The Structure of 'THE'--Multiprogramming System" Commun. ACM 11 (1968), 5:341–346. http://www.cs.utexas.edu/users/EWD/ewd01xx/EWD196.PDF