Re: unions and aliasing

Paul Eggert Wed, 11 Apr 2007 17:50:13 -0700

Bruno Haible <[EMAIL PROTECTED]> writes:

> In C99 6.2.6.1.(4) there is a definition of the term "object representation",
> together with a guarantee that memcpy() can be used to extract it.


Yes, that's correct.  But that text allows you to extract the
representation only into an array of unsigned char; it is not a
license to copy the representation into storage of other types.

> In C99 6.5.(7) there is text which explicitly allows this kind of code:
>
>    union { long double x; int y[6]; } m;
>    m.x = ...;
>    return m.y[0];

I think you're misreading the text here.  The text says an object
shall have its stored value accessed only by an lvalue expression that
has a type that is sort-of-compatible with the effective type of the
object.  By "sort-of-compatible" I mean that the type is compatible,
or it's a qualified version of a compatible type, or it's a signed
type corresponding to the type, or things like that.  But int and long
double are not sort-of-compatible.

Apparently you are relying on the following clause:

   -- an aggregate or union type that includes one of the aforementioned
      types among its members (including, recursively, a member of a
      subaggregate or contained union), or

But this does not at all give license to the above code.  It allows
you to use a union member only if that member is sort-of-compatible
with the effective type of the object.  So, for example, this is
valid:

          union { int x; int const y; } m;
          m.x = 10;
          return m.y;

However, the code that you mentioned is not valid.  For example, an
implementation is allowed to have trap representation for integers,
and it is allowed to have undefined behavior if m.y[0] happens to have
a trap representation.

The point of these rules is to allow compilers to optimize based on
type-based inferencing.  And lots of compilers do that.  I'd be leery
of assuming that compilers won't optimize their way into trouble here.


> In C99 6.7.2.1.(14) there is text that allows to cast a pointer to a member
> of the union to a pointer to the union itself, and vice versa. Hence:
>
>    union { long double x; int y[6]; } m;
>
> implies
>
>    (void*) &m.x == (void*) &m == (void*) &m.y.

Yes, that is true, but that does not mean that you can store a double
and then load an int and get well-defined behavior.  It merely means
their addresses will be equal.


> So I don't even buy that the standard is muddy on this area. It looks rather
> like some implementations (including GCC) may not implement 6.5.(7) item 5
> ("an aggregate or union type ...") correctly.

No, I'm afraid you've misinterpreted the standard.  It's fairly clear
(though I guess not clear enough :-) and GCC is interpreting it
correctly.


> 1) In http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm

But example 2 of
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm> clearly
violates the C standard, and yet it would seem to be OK according to
the interpretation that you are thinking of.


>      "The current situation requires more consideration,

The only consideration that the committee is worried about is how to
say exactly that you can't mess with malloc results in the way that
example does.


>      
> http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html
>       As a practical matter, reading and writing to any
>       member of a union, in any order, is acceptable practice."

That's being way too optimistic in general: it's certainly not true
for many compilers.  Perhaps it is true for the Cell compilers, but
even there I'm skeptical.


> 3) The gcc documentation is clear about it too:

Yes, I agree that the behavior is well-defined in GCC if all accesses
are via the union.  However, I doubt whether it's true for all
compilers.


> Do you have an indication that the problems faced in the Linux kernel
> apply also to the case of references through the union variable, or only
> when using pointers to the union members?

No, I am mostly relying on my intuition and feel here.  Clearly the
behavior doesn't conform to the standard, and my guess is that some
compilers won't do what you expect here.


> I'm not worried by it, because if a compiler does this optimization,
> the unit test will catch it.

I doubt whether the unit test will suffice to catch it.  Many compiler
optimizations are tricky and are done in some cases but not others.
The unit test may not tickle the cases in question.  Furthermore, the
unit test may be compiled with different options than the production
code.


> 'unsigned int' accesses are an epsilon more efficient than 'unsigned
> char' accesses

In this particular case I don't think the epsilon is worth the danger
of relying on undefined behavior.

Re: unions and aliasing

Reply via email to