On Thu, 2012-09-20 at 09:12 +0200, Eric Botcazou wrote: > > The attached patch catches C constructs: > > (A << 8) | (A >> 8) > > where A is unsigned 16 bits > > and maps them to builtin_bswap16(A) which can provide more efficient > > implementations on some targets. > > This belongs in tree-ssa-math-opts.c:execute_optimize_bswap instead. > > When I implemented __builtin_bswap16, I didn't add this because I thought > this > would be overkill since the RTL combiner should be able to catch the pattern. > Have you investigated on this front? But I don't have a strong opinion. >
A while ago I've tried doing that for SH (implementing bswap16 with RTL combine). It was like an explosion of patterns, because combine would try out a lot of things depending on the surrounding code around the actual bswap16. In the end I decided to drop that stuff for the most part. BTW, the built-in documentation says: Built-in Function: int16_t __builtin_bswap16 (int16_t x) Built-in Function: int32_t __builtin_bswap32 (int32_t x) However, it seems the result is always unsigned for those. At least on SH I get the following: int test (short x) { return __builtin_bswap16 (x); } swap.b r4,r4 ! 8 *rotlhi3_8 rts ! 24 *return_i extu.w r4,r0 ! 9 *zero_extendhisi2_compact ... and similarly for int32. Can anyone else confirm this? Cheers, Oleg