Paul Brook escreveu:
Would it be complicated to implement e.g. __builtin_bswap32 on armv6
with inline semantics (I mean, without generating a library call)?
Probably not.
Paul
Also, it would be interesting to have an inline version for
architectures older than armv6, when optimizing for speed (rather than
size), what would you think about something more or less like (for the
32 bits version):
__asm __volatile__ (
"eor %1, %2, %2, ror #16\n\t"
"bic %1, %1, #0x00ff0000\n\t"
"mov %0, %2, ror #8\n\t"
"eor %0, %0, %1, lsr #8"
: "=r" (x), "=r" (tmp)
: "r" (x));
(four instructions, clobbers only one register other than the source.
This code comes from netbsd).
Currently, gcc 4.3.0 (-mcpu=arm7tdmi -O3) generates a call to
__bswapsi2, which translates to this:
e1a03000 mov r3, r0
e1a00c20 lsr r0, r0, #24
e20328ff and r2, r3, #16711680 ; 0xff0000
e1800c03 orr r0, r0, r3, lsl #24
e1800422 orr r0, r0, r2, lsr #8
e2033cff and r3, r3, #65280 ; 0xff00
e1800403 orr r0, r0, r3, lsl #8
e12fff1e bx lr
which, by the way, is not very different to the current thumb2 version
on -mcpu=cortex-m3:
4603 mov r3, r0
f403 027f and.w r2, r3, #16711680 ; 0xff0000
0e00 lsrs r0, r0, #24
ea40 6003 orr.w r0, r0, r3, lsl #24
f403 437f and.w r3, r3, #65280 ; 0xff00
ea40 2012 orr.w r0, r0, r2, lsr #8
ea40 2003 orr.w r0, r0, r3, lsl #8
4770 bx lr
46c0 nop (mov r8, r8)
I didn't look, but I think that this is compiled from the standard macro:
#define bswap(x) ((((x) & 0xff000000) >> 24) | \
(((x) & 0x00ff0000) >> 8) | (((x) & 0x0000ff00) << 8) | \
(((x) & 0x000000ff) << 24))
- Alexandre