http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145

            Bug ID: 60145
           Summary: [AVR] Suboptimal code for byte order shuffling using
                    shift and or
           Product: gcc
           Version: 4.8.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthijs at stdin dot nl

(Not sure what the component should be, just selected "other" for now)

Using shifts and bitwise-or to compose multiple bytes into a bigger integer
results in suboptimal code on AVR.

For example, a few simple functions that take two or four bytes and
compose them into (big endian) integers. Since AVR is an 8-bit platform,
this essentially just means moving two bytes from the argument register
to the return value registers. However, the outputted assembly is
significantly bigger than that and contains obvious optimization
opportunities.

The example below also contains a version that uses a union to compose the
integer, which gets optimized as expected (but only works on little-endian
systems, since it relies on the native endianness of uint16_t).

matthijs@grubby:~$ cat foo.c
#include <stdint.h>

uint16_t join2(uint8_t a, uint8_t b) {
        return ((uint16_t)a << 8) | b;
}

uint16_t join2_efficient(uint8_t a, uint8_t b) {
        union {
                uint16_t uint;
                uint8_t arr[2];
        } tmp = {.arr = {b, a}};
        return tmp.uint;
}

uint32_t join4(uint8_t a, uint8_t b, uint8_t c, uint8_t d) {
        return ((uint32_t)a << 24) | ((uint32_t)b << 16) | ((uint32_t)c << 8) |
d;
}
matthijs@grubby:~$ avr-gcc -c foo.c -O3 && avr-objdump -d foo.o

foo.o:     file format elf32-avr


Disassembly of section .text:

00000000 <join2>:
   0:   70 e0           ldi     r23, 0x00       ; 0
   2:   26 2f           mov     r18, r22
   4:   37 2f           mov     r19, r23
   6:   38 2b           or      r19, r24
   8:   82 2f           mov     r24, r18
   a:   93 2f           mov     r25, r19
   c:   08 95           ret

0000000e <join2_efficient>:
   e:   98 2f           mov     r25, r24
  10:   86 2f           mov     r24, r22
  12:   08 95           ret

00000014 <join4>:
  14:   0f 93           push    r16
  16:   1f 93           push    r17
  18:   02 2f           mov     r16, r18
  1a:   10 e0           ldi     r17, 0x00       ; 0
  1c:   20 e0           ldi     r18, 0x00       ; 0
  1e:   30 e0           ldi     r19, 0x00       ; 0
  20:   14 2b           or      r17, r20
  22:   26 2b           or      r18, r22
  24:   38 2b           or      r19, r24
  26:   93 2f           mov     r25, r19
  28:   82 2f           mov     r24, r18
  2a:   71 2f           mov     r23, r17
  2c:   60 2f           mov     r22, r16
  2e:   1f 91           pop     r17
  30:   0f 91           pop     r16
  32:   08 95           ret

Reply via email to