------- Comment #4 from kai dot extern at googlemail dot com 2010-02-27 13:46 ------- > You are violating c++ aliasing rules. You access a uint8_t via > different types.un
Actually, I address other types via uint8_t, but that is neither here nor there. (Oh, I just realized you probably didn't mean the union but the load in mem2int. But the following comments apply just as well to that part.) First, as far as I can tell, the code produced is clearly *correct*. Second, the optimization of everything except uint64_t shows that gcc clearly understands what I'm trying to do. In fact, my first attempts had no type-punning whatsoever; however, the resulting code demonstrated that gcc had no clue what I was trying to do - it was pretty much completely unoptimized. So I went looking forways to describe the problem that gcc would actually understand. This particular solution grew out of some other gcc bug which compared various versions of determining endianess, and showed that currently, the version with unions is the only one gcc can optimize - that seems to be a regression introduced with 4.0, and the bug seems to be still open. Anyway, my point here is that gcc *does* understand this idiom. I see two problems: 1. For some reason, gcc can optimize the byte_sex function to a constant - except when the integer is 64 bits long. It is not obvious if the problem is in the 64 bits, or in the 8 loop iterations, but something does not work there which works in the smaller cases. 2. The actual byte swapping code (which has nothing whatsoever to do with any aliasing) is clearly suboptimal in some cases. Again, it is not obvious what property causes gcc to generate this one just fine for some cases, and not very fine for others. The basic logic is obviously the same. Anyway, if you can point out a way to write this that is completely standards-conformant, generates decent code (at least as good as this version), and does not rely on me telling the compiler what the endianness is, I'm interested in learning. (I should probably point out that this ought to work even for inconsistent endianness - I don't recall exactly, but I rember hearing about a cpu that did something like 3412 byte ordering.) Just for comparision, my first attempt looked like this: template < int n, typename X > struct xword { void operator=(X x) { set(x); }; operator X() { return get(); }; protected: void set(register X x) { for (register int i = 0; i < n; i++) { m_x[TRAITS::le ? i : n - i - 1] = x & 0xff; x >>= 8; }; }; X get() { register X r; for (register int i = 0; i < n; i++) { r <<= 8; r |= m_x[TRAITS::le ? n - i - 1 : i]; }; return r; }; private: uint8_t m_x[n]; }; No aliasing issues. Also, no optimization whatsoever. -- kai dot extern at googlemail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kai dot extern at googlemail | |dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43197