Paul Brook escreveu:
Would it be complicated to implement e.g. __builtin_bswap32 on armv6
with inline semantics (I mean, without generating a library call)?

Probably not.

Paul


Also, it would be interesting to have an inline version for architectures older than armv6, when optimizing for speed (rather than size), what would you think about something more or less like (for the 32 bits version):

 __asm __volatile__ (
   "eor    %1, %2, %2, ror #16\n\t"
   "bic    %1, %1, #0x00ff0000\n\t"
   "mov    %0, %2, ror #8\n\t"
   "eor    %0, %0, %1, lsr #8"
   : "=r" (x), "=r" (tmp)
   : "r" (x));

(four instructions, clobbers only one register other than the source. This code comes from netbsd).

Currently, gcc 4.3.0 (-mcpu=arm7tdmi -O3) generates a call to __bswapsi2, which translates to this:
e1a03000        mov     r3, r0
e1a00c20        lsr     r0, r0, #24
e20328ff        and     r2, r3, #16711680       ; 0xff0000
e1800c03        orr     r0, r0, r3, lsl #24
e1800422        orr     r0, r0, r2, lsr #8
e2033cff        and     r3, r3, #65280  ; 0xff00
e1800403        orr     r0, r0, r3, lsl #8
e12fff1e        bx      lr


which, by the way, is not very different to the current thumb2 version on -mcpu=cortex-m3:

4603            mov     r3, r0
f403 027f     and.w   r2, r3, #16711680       ; 0xff0000
0e00            lsrs    r0, r0, #24
ea40 6003   orr.w   r0, r0, r3, lsl #24
f403 437f     and.w   r3, r3, #65280  ; 0xff00
ea40 2012   orr.w   r0, r0, r2, lsr #8
ea40 2003   orr.w   r0, r0, r3, lsl #8
4770            bx      lr
46c0            nop                     (mov r8, r8)



I didn't look, but I think that this is compiled from the standard macro:
#define bswap(x)     ((((x) & 0xff000000) >> 24) |   \
(((x) & 0x00ff0000) >>  8) |   (((x) & 0x0000ff00) <<  8) |  \
(((x) & 0x000000ff) << 24))



- Alexandre

Reply via email to