Bruno Haible <[EMAIL PROTECTED]> writes: > In C99 6.2.6.1.(4) there is a definition of the term "object representation", > together with a guarantee that memcpy() can be used to extract it.
Yes, that's correct. But that text allows you to extract the representation only into an array of unsigned char; it is not a license to copy the representation into storage of other types. > In C99 6.5.(7) there is text which explicitly allows this kind of code: > > union { long double x; int y[6]; } m; > m.x = ...; > return m.y[0]; I think you're misreading the text here. The text says an object shall have its stored value accessed only by an lvalue expression that has a type that is sort-of-compatible with the effective type of the object. By "sort-of-compatible" I mean that the type is compatible, or it's a qualified version of a compatible type, or it's a signed type corresponding to the type, or things like that. But int and long double are not sort-of-compatible. Apparently you are relying on the following clause: -- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or But this does not at all give license to the above code. It allows you to use a union member only if that member is sort-of-compatible with the effective type of the object. So, for example, this is valid: union { int x; int const y; } m; m.x = 10; return m.y; However, the code that you mentioned is not valid. For example, an implementation is allowed to have trap representation for integers, and it is allowed to have undefined behavior if m.y[0] happens to have a trap representation. The point of these rules is to allow compilers to optimize based on type-based inferencing. And lots of compilers do that. I'd be leery of assuming that compilers won't optimize their way into trouble here. > In C99 6.7.2.1.(14) there is text that allows to cast a pointer to a member > of the union to a pointer to the union itself, and vice versa. Hence: > > union { long double x; int y[6]; } m; > > implies > > (void*) &m.x == (void*) &m == (void*) &m.y. Yes, that is true, but that does not mean that you can store a double and then load an int and get well-defined behavior. It merely means their addresses will be equal. > So I don't even buy that the standard is muddy on this area. It looks rather > like some implementations (including GCC) may not implement 6.5.(7) item 5 > ("an aggregate or union type ...") correctly. No, I'm afraid you've misinterpreted the standard. It's fairly clear (though I guess not clear enough :-) and GCC is interpreting it correctly. > 1) In http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm But example 2 of <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm> clearly violates the C standard, and yet it would seem to be OK according to the interpretation that you are thinking of. > "The current situation requires more consideration, The only consideration that the committee is worried about is how to say exactly that you can't mess with malloc results in the way that example does. > > http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html > As a practical matter, reading and writing to any > member of a union, in any order, is acceptable practice." That's being way too optimistic in general: it's certainly not true for many compilers. Perhaps it is true for the Cell compilers, but even there I'm skeptical. > 3) The gcc documentation is clear about it too: Yes, I agree that the behavior is well-defined in GCC if all accesses are via the union. However, I doubt whether it's true for all compilers. > Do you have an indication that the problems faced in the Linux kernel > apply also to the case of references through the union variable, or only > when using pointers to the union members? No, I am mostly relying on my intuition and feel here. Clearly the behavior doesn't conform to the standard, and my guess is that some compilers won't do what you expect here. > I'm not worried by it, because if a compiler does this optimization, > the unit test will catch it. I doubt whether the unit test will suffice to catch it. Many compiler optimizations are tricky and are done in some cases but not others. The unit test may not tickle the cases in question. Furthermore, the unit test may be compiled with different options than the production code. > 'unsigned int' accesses are an epsilon more efficient than 'unsigned > char' accesses In this particular case I don't think the epsilon is worth the danger of relying on undefined behavior.