Re: [RFC] Implement __builtin_bswap* for ARMv6

Alexandre Pereira Nunes Mon, 07 Apr 2008 14:54:11 -0700

Paul Brook escreveu:

Would it be complicated to implement e.g. __builtin_bswap32 on armv6
with inline semantics (I mean, without generating a library call)?


Probably not.

Paul

Also, it would be interesting to have an inline version forarchitectures older than armv6, when optimizing for speed (rather thansize), what would you think about something more or less like (for the32 bits version):


 __asm __volatile__ (
   "eor    %1, %2, %2, ror #16\n\t"
   "bic    %1, %1, #0x00ff0000\n\t"
   "mov    %0, %2, ror #8\n\t"
   "eor    %0, %0, %1, lsr #8"
   : "=r" (x), "=r" (tmp)
   : "r" (x));

(four instructions, clobbers only one register other than the source.This code comes from netbsd).

Currently, gcc 4.3.0 (-mcpu=arm7tdmi -O3) generates a call to__bswapsi2, which translates to this:

e1a03000        mov     r3, r0
e1a00c20        lsr     r0, r0, #24
e20328ff        and     r2, r3, #16711680       ; 0xff0000
e1800c03        orr     r0, r0, r3, lsl #24
e1800422        orr     r0, r0, r2, lsr #8
e2033cff        and     r3, r3, #65280  ; 0xff00
e1800403        orr     r0, r0, r3, lsl #8
e12fff1e        bx      lr

which, by the way, is not very different to the current thumb2 versionon -mcpu=cortex-m3:


4603            mov     r3, r0
f403 027f     and.w   r2, r3, #16711680       ; 0xff0000
0e00            lsrs    r0, r0, #24
ea40 6003   orr.w   r0, r0, r3, lsl #24
f403 437f     and.w   r3, r3, #65280  ; 0xff00
ea40 2012   orr.w   r0, r0, r2, lsr #8
ea40 2003   orr.w   r0, r0, r3, lsl #8
4770            bx      lr
46c0            nop                     (mov r8, r8)



I didn't look, but I think that this is compiled from the standard macro:
#define bswap(x)     ((((x) & 0xff000000) >> 24) |   \
(((x) & 0x00ff0000) >>  8) |   (((x) & 0x0000ff00) <<  8) |  \
(((x) & 0x000000ff) << 24))



- Alexandre

Re: [RFC] Implement __builtin_bswap* for ARMv6

Reply via email to