On Thu, 2012-09-20 at 09:12 +0200, Eric Botcazou wrote:
> > The attached patch catches C constructs:
> > (A << 8) | (A >> 8)
> > where A is unsigned 16 bits
> > and maps them to builtin_bswap16(A) which can provide more efficient
> > implementations on some targets.
>
> This belongs in tree-ssa-math-opts.c:execute_optimize_bswap instead.
>
> When I implemented __builtin_bswap16, I didn't add this because I thought
> this
> would be overkill since the RTL combiner should be able to catch the pattern.
> Have you investigated on this front? But I don't have a strong opinion.
>
A while ago I've tried doing that for SH (implementing bswap16 with RTL
combine). It was like an explosion of patterns, because combine would
try out a lot of things depending on the surrounding code around the
actual bswap16. In the end I decided to drop that stuff for the most
part.
BTW, the built-in documentation says:
Built-in Function: int16_t __builtin_bswap16 (int16_t x)
Built-in Function: int32_t __builtin_bswap32 (int32_t x)
However, it seems the result is always unsigned for those.
At least on SH I get the following:
int test (short x)
{
return __builtin_bswap16 (x);
}
swap.b r4,r4 ! 8 *rotlhi3_8
rts ! 24 *return_i
extu.w r4,r0 ! 9 *zero_extendhisi2_compact
... and similarly for int32.
Can anyone else confirm this?
Cheers,
Oleg