Gabriel Dos Reis wrote:
What I'm claiming is that he thinks "undefined behaviour" in the
standard should not be taken as meaning "go to hell" (or punishment to
borrow words from you) or absolute liberty for compiler writers to do
just about everything that is imaginable, regardless of expectations.
In other words, it is a question of balance.
The trouble is that the standard does not formalize the balance, and
that is why we use the rhetoric. The real issue is whether the compiler
optimizer circuits should be allowed to assume that there is no overflow.
At first glance, it seems reasonable to say yes, that is after all what
the undefined semantics is about. The trouble, is, as I have pointed out
before, and tried to illustrate with my password example, that this
assumption, propagated by clever global flow algorithms, can have very
unexpected results.
So what to do?
We could try to formalize the notion that although overflow is undefined,
this information cannot be propagated, or perhaps more specifically cannot
be propagated backwards. But in practice this is very difficult to formalize,
and it is very hard to put bounds on what propagation means in this case.
We have a sort of informal notion that says the kind of thing that occurred
in my password example is a bad thing and should not happen, but we have a
lot of trouble formalizing it (I know of lots of attempts to do this, no
successes, anyone know a success story here?) Note that a similar issue
arises with assertions (is the compiler allowed to assume that assertions
are true and propagate this information?)
We could try to avoid the use of the general undefined. After all if
Stroustrup doesn't like the consequence of unbounded undefinedness,
then presumably he doesn't like the language of the standard. It is
the standard that introduces this notion of unbounded undefinedness
not me :-)
In fact I think the attempt to avoid unbounded undefinedness (what Ada
calls erroneous execution), is something that is very worth while. One
of the major steps forward in the Ada 95 standard over the Ada 83 standard
was that we eliminated many of the common cases of erroneousness and replaced
it with bounded errors.
FOr example, consider the case of uninitialized variables. In Ada 83, a
reference to an uninitialized scalar variable caused the program to be
erroneous. Now in Ada 95, such a value may have an invalid representation.
To reference an uninitialized variable is now not-erroneous, but rather
a bounded error. The possible effects are defined as follows in the RM:
9 If the representation of a scalar object does not represent a value of
the object's subtype (perhaps because the object was not initialized), the
object is said to have an invalid representation. It is a bounded error to
evaluate the value of such an object. If the error is detected, either
Constraint_Error or Program_Error is raised. Otherwise, execution continues
using the invalid representation. The rules of the language outside this
subclause assume that all objects have valid representations. The semantics
of operations on invalid representations are as follows:
10 If the representation of the object represents a value of the
object's type, the value of the type is used.
11 If the representation of the object does not represent a value of
the object's type, the semantics of operations on such
representations is implementation-defined, but does not by itself
lead to erroneous or unpredictable execution, or to other objects
becoming abnormal.
As, an example, he
illustrated the issue with the story -- quickly classified as "legend" by
Robert Dewar -- about the best C optimizing compilers that miscompiled
the Unix kernel that nobody wanted to use in practice (even if it
would blow up any other competing compiler) and the company
running out of business.
Actually I asked for documentation of this claim, and as far as I remember
no one provided it, which is a pity. It is really useful to have examples
of real life situations where sticking to the standard and ignoring real
world practice resulted in useless compilers. I gave one such example,
the Burroughs Fortran Compiler.
In class, I have always suggested as another example that a C compiler
that generated code for a flat address space would be a menace if it
gave random results for comparing addresses between different allocated
objects. I assume no C compiler does this in practice.
So the Unix kernel example would be really nice to pin down. What compiler
was this? And what was the "legitimate" freedom that it took that it should
not have.
By the way, I generally agree that C is a language where the expected behavior
is quite a bit larger than the formally defined behavior (I consider this a bad
characteristic of a language by the way).
At another extreme, in the case of COBOL, when we wrote Realia COBOL, there
were a number of places where we deliberately deviated from the COBOL standard.
Why? Because there were bugs in the IBM mainframe implementations, and it was
more important to maintain compatibility with these bugs than to do what the
standard demanded :-)